Hi all,

Back this week with some football content. Our SportsVizSunday team like to occasionally suggest different sports to visualise as part of the initiative, so this month I thought I would join in on the action with you all.

Our SportsVizSunday repo can be found here. Read more about the initiative and the weekly round ups here.

Today we are going to look at some data processing to get our data ready for use within Tableau. I won’t cover too much of the tableau side of things, because I’ve previously written large amounts on Tableau shot maps so the premise behind plotting the points with layers and creating layers would seemingly be repetitive.

I am hoping this blog acts as a bridge between retrieving data and prepping it for visuals.

Here is what we will look to recreate. This is all the on-target shots as part of the Women’s World Cup Final where Spain beat England 1-0.

Data Access

StatsBomb are kind enough to release data for free. The WWC was one example of this. To retrieve the data we will write a simple python script and export it to csv, but make no transformations to the data within the script.

The code can be downloaded from the repo, using the github icon under the title. Simply put, we look for the specific competition and season, and export all the matches. Then using this list, I narrow the search down just to the final and export all the match events. The raw data for this is also within the repo.

Data Transformation

Next we look at transforming the data. Now you could do this in any way, but for today we will look to do it in Alteryx. Again, you can download the WWC.yxmd from the repo to follow along.

Above is a print screen of the workflow. Let’s work through it bit by bit.

The general cleansing part of the workflow looks to remove null columns, It then sorts the data based on the index file and timestamp, making all our match events in ascending order. After that I start to decide what columns I actually want, dropping a whole bunch of fields around clearances and fouls mainly because I know I’m just going to look at shots.

The shots data section limits type to equal shot, meaning we are reducing our dataset, excluding things such as passes and carries. I also filter the shot outcome just to look at on target shots excluding things such as blocked, wayward and off target.

If we look at the subset of data we need to clean up a location field as well as shot end field. Reason being is they come in pairs in the field. To do this I use the split to columns field as well as a replace formula to get rid of the brackets.

Transforms to

The Small Multiples part of the workflow is personal preference. I like to assign the columns and rows of which each shot is before hand so it doesn’t mean I need to calculate along anything in Tableau.

I filter to the unique ID’s (Match shot events) assign them a record ID and then use a formula tool to allocate a column and row, before rejoining this into the data.

You can see this uses the MOD and Floor calculation.

Freeze Frame Data

So interestingly enough the freeze frame data also appears in the match events dataset, but because I didn’t want to do a whole bunch of json manipulation I thought it would be easier to extract the raw data and then show that is joined in on ID=ID. This means rather than have our data at one row per shot, we will have a row for each player captured within data. So a shot ID may have anywhere between 1 and … well 22 (unlikely but possible) records.

In a similar way to the shot location data, the only transformation here needed is a split to columns and formula tool to tidy up X and Y co-ordinates.

Exported Data

The final dataset is what we will take into Tableau – It can be found in the repo as WWC_Spain_England_Shots.csv

Thanks again to StatsBomb for making this data available. 


So a few notes on the Tableau side, but of course download the workbook for free from my profile.

The first is if you’d like to create a shot direction the data will need to be with a union. Otherwise you can just connect once to the data.

To achieve multiple layers within the visual, I end up using the Makepoint() function.

You’ll see 4 layers, one is the shot line path, I put it at the back so you can see if there were players in the shot line.

The freeze frame layer has all the locations of the surrounding players.

The shot…. Well this is theoretically just the shot start mark, but I wanted to add in a circle to the visual.

The Text field, which is an arbitary point positioned strategically above the football pitch to add some labels and context to the shots. By all means, reach out if you have any questions on these elements, or refer back to some of the older work I’ve posted on soccer the past few years.

Going further,

  • Try write all the transformations in the data in Python
  • Add in labels from the freeze frame data for each players name
  • Add in some extra data marks
  • Map pitch visibility from the dataset