The festive period is slowly creeping up on us! I hope everyone finds time with friends, family and loved ones to recharge and reconnect.
The blog today will look at an introduction to StatsBomb data and how to create your own heat-map of pass data in python. In the new year, we will probably look to replicate the same idea in Tableau.
If you are new to the open data, take a look at a previous introduction blog I wrote last year, here.
All code resources, can be found at the top of the page below the title.
Here is the repo for the open data. We’ll actually be using a package that stores this information but it will be useful as a reference point as we dive in, so will look to explore the folder structure. As we will be using a prebuilt package there is no requirement to download this repo specifically.
I’ve chosen to plot the Womens euro’s final – specifically the passes of the England team. Let’s walk through the code snippets so you can go ahead and create your own.
The first block of code looks at the packages we require. I then create two simple datasets, the first is a csv version of what competition data is available from statsbomb through their open data. I then pick the specific ID and season which equates to the Womens Euro Finals.
The actual final between England and Germany is a specific match id. The final line of code here, looks to create a dataframe of all the different events captured within that match.
The next few lines of code above are important in terms of splitting the location field out into its x and y co-ordinates. You’ll notice here we also limit the type of event to Pass and the team to only the England team. Of course these steps are optional based on what measure you’re looking at, but the syntax of splitting the field will be important.
So this next chunk of code is a little meaty but hopefully we can make sense of it.
To start we define the figure size of the pitch, and we give it a few attributes in terms of style and colours.
I then want to create a variety of different style heatmaps, to weigh up which ones may be the best representation of my data. You’ll see the number of bins is equal to my layout. I.e I have 4 sets of bins, as well as a layout of 1×4 meaning my overall chart will be 4 pitches next to one another, with the heatmaps based on the bin sizes.
The for loop creates each of our heatmaps.
First it takes the pass co-ordinates of the x and y and bins them into their corresponding category.
Next we use the scatter function to also overlay our co-ordinates.
Finally I add some labelling calculations to see what % of passes were made within that specific bin.
All goes well, the visualisation should appear.
Now seeing the input, it may help you with understanding the bin values.
Thats the end to the visualisation.
Of course, you can go back and amend labelling and colours and sizes to your own discretion. Do check out the mplsoccer documentation here.
As well as some of the different colour mappings here.
If you get stuck with the layout of the above,
I also left in a chunk of code that looks to build just one chart.
This code takes the original x and y location data based on passes sets them equal to new values and then passes those through the bin statistic and heatmap position functions. Notice a few small changes including ax=ax as we only have one single chart.
The chart type in the second example tends to be used more for pressure events. If you’d like to follow the mplsoccer version of the code where they utilise pressure events, please follow this link.
Another good follow along tutorial can be found here.
Let me know how you get on with this one. As always the code can be found at the top of the page in the Github Repo.