Welcome back after a short break. I hope everyone has taken time to recharge a little over the summer period.
As the title suggests, today we will be looking at NBA data, using the NBA API. I already hear your internal screams: “CJ stop doing python tutorials when you’re terrible at python.”
In all seriousness, If individuals have some tips they want to share do get in touch via Twitter. This has just been my long standing wing-it approach in the hope to influence others to start to code. Learning supplementary skills is something I’m pretty passionate about and think it can really help you grow.
I hope to revisit these datasets to create a more detailed visualisation further down the line, but here is something I mocked up from the data. You can download it on Tableau as part of this blog.
You may have seen, last June #SportsVizSunday released a dataset of shot location data for the NBA from 1997-2020, provided by Zak Geis. In fact, this was the first #SportsVizSunday I ever took part in. You can access the #SportsVizSunday original data, here, if you’d like to skip straight to visualising data. Alternatively the data from my tutorial can be found here.
This tutorial will deviate slightly from last Junes data and will focus on three main components.
- Can we get the Games details from the play-offs of most recent year?
- Can we then find the Play-by-Play match details for the final game in the play-offs?
- Can we finally find a more granular level of detail of the shot locations for the final game? (same schema as in the #SportsVizSunday Dataset)
NBA_API is an API Client package to access the APIs for NBA.com.
Some useful documentation to look through relevant for the tutorial are:
Available via Github. The code is sequential. The below walkthrough will explain each aspect of the data we look to retrieve.
If you would like to see an alternative version of the code that loops through all the play-off games. Please take a look at the loop folder within the repo.
Open up your Pycharm console or interface of your choosing. I’m using a new virtual environment and running on Python version 3.9.
You will want to head to the terminal and run the following package installs.
- pip install nba_api
- pip install pandas
Copy the GitHub code into the console and we are in a position to run the code!
You can amend attributes such as season and year amongst other things. You can hover over the class for more information, or visit the documentation.
In short, this dataset we retrieve includes a list of game_id’s for the playoffs of the most recent year. We can then use this game_id later on in the code having look through the newly exported csv.
I take the game_id of the final game in the playoffs between Phoenix Suns and Milwaukee.
You will notice the Game_ID is actually part of the URL. This means if you have a specific game in mind you can theoretically find it on the website and take it from the NBA website, under games.
In this case the Game_ID is 0042000406
The play by play csv contains all the game events associated to that particular match using our chosen game_id. I have hardcoded this value for the case of the tutorial, having run the Game report and looked for the game I wanted.
Finally, we find all the shot events taken during the match, we can look to left join this data to our pbp events data sheet when in Tableau!
Once the code has finished and exited you will see the three files appear in your folder file path. I’m normally quite lazy and drag them onto my desktop afterwards for when I build my Tableau visualisations, and use the original file path as a staging house but it’s completely at your discretion where you want your files to sit.
LOOKING AT THE DATA IN MORE DETAILS
You’ll see that the GameID is a 10 digit code. If you open the file in excel, be cautious that it will chop off leading zero’s. A copy of the play-offs file is held in the GITHUB repo. You’ll also see that the dataset is held at team level. Therefore we have duplicate columns of GameID’s where one team wins, and one loses.
Play By Play
You may come across the issue of the score line appearing as a date when you open the csv because of the way your formatting is set up. It may read a score line of 2-2 as the 2nd of February.
A quick way around this is to open a fresh excel document. Go to the data tab, Import as text.
You can then use a comma delimiter and make the fields you want into text field, which will sort the date error, then re-save this file down.
In terms of the shot location data, I would recommend using the co-ordinates outlined here, if you plot them in Tableau. The shot data hosts a whole wealth of information in terms of the team, scorer, shot and points type. Below is the court image. The recommended co-ordinates for mapping on Tableau are: X: 250, -250, Y: -52, 418.
Now we have our dataset at Game, Event, and Shot Level detail because of our common field of GameID between all three datasets. In addition to the event number between the pbp dataset and the shot dataset.
One of the main things I learnt was the NBA API will kick me out if I send a lot of requests to it in a loop to get the events for each of the individual matches. This is why my example just shows one match, rather than iterates through a list of the games outlined previous. This is also why you will see I have added in the try/except function into the code, as well as added a sleep timer. Probably a little over cautious but would rather be on the safe side so I can keep accessing data. If you feel comfortable with the code above, do take a look in the loop folder where an alternative python script looks to make the games csv into a list of games and then iterate through them making a larger csv for plays and shots to hold all events for the tournament!
Secondly, I learnt the power of documentation. I hit a lot of errors on my journey. It took me ages to figure out how to write the play by play chunk of code because most tutorials I’ve read online look at specific players rather than at an event and team level. I initially struggled with what to pass through the function. The documentation i mentioned really helped me understand the different aspects. I’ll happily admit my code rarely runs first time. If you’re a new comer to Python don’t get disheartened! Some main pain points included realising the gameID had to be 10 digits long, as well as zero default in playerid meant that you could include all players!
Thirdly, outlining what I want from the data and how to get it. There are plenty of resources out there for getting shot location data for individual players. You can check out a few good resources below:
These resources were particularly useful:
I wanted to focus on how can we show a full game. Once I managed to get the play by play dataset, I was delighted to see that you can join the shots dataset with it on the event type! See below how this is done in Tableau.
- Try reading some of the other documentation to see what other NBA stats you can source. (Reading)
- Try finding the games for the regular season of last year. (python)
- Try creating a shot map in Tableau. (Tableau)
- Try create a visualisation showing the breakdown of game per play/minute. (Tableau)
- Try create the tournament bracket at the top of the page, using a previous template in collaboration with the FlerlageTwins. (Tableau)
I stumbled across this beautiful website that makes your code look nice for blog posts. You can access it here.
It allows you to edit the colour scheme and match it to your code type. There are other various sites and ways you can also embed your code ready to copy and paste straight from the site which I’ll also be looking into, but I’m pretty pleased with the look of this one for a high level summary.
As always, Let me know how you get on with this one. I can be reached on Twitter, @_CJMayes.