I’m super excited to be able to host a guest blog for #SportsVizSunday this week. The soccer community is one that is ever increasing and I keep coming across more and more individuals that share their passion for sports online.
One individual who particularly stood out was Yash. What I particularly love about their work is the focus not only on men’s soccer, but women’s leagues too. Not only this, but Yash’s talent in designing good visuals is supported by great analysis. Check out some of highlights below!
Today’s guest blog will look to outline a few tips and tricks for creating soccer maps using python. It will predominantly focus on the free Statsbomb data that is publicly available, and the walk through will look at deep progressions (completed passes or carrys) in the FA WSL 2020-21 season.
Before we start, What is meant by deep completions?
Deep completions is defined as the completed passes or carries that end inside the semi-circle with the center at the mid point of the goal and a diameter equal to the width of the penalty box while originating outside this region.
Check out this example output which will be showcased!
This blog will be aimed at those who feel fairly comfortable with basic python syntax but want to elevate their skills applying it to sports data. So if you are new to either tool, please do not be disheartened and feel free to reach out if you have any questions to Yash or myself.
Jump to the repo at the top of the page to follow the tutorial!
In future blogs, we will replicate some of these ideas in Tableau, so stay tuned!
CJ: Yash, for those who are unaware how did you get into football analytics?
Y: It was something I actively started pursuing when the pandemic began. I always read a lot of analysis before that and it always piqued my interest about how the use of data can help us draw meaningful insights about the game. When I started out I had no idea where to get the data and anything like that, I started asking questions, searching through the mighty internet and stumbled upon a couple of resources online to get started. I started out by making a scatter plot of defensive duels and their success rate for defenders in La Liga and that’s how it all began really. The next step was to learn languages and get started with free event data that was available and that’s what I did. I asked a lot of questions on everything and I still do cause I still know very little.
One thing I have always maintained is to first try and understand what the metrics I am using indicate and what are the pitfalls, I feel it’s important to understand that before I draw out conclusions based on it.
CJ: I Love your piece on completed pass and carries. Could you share the raw dataset for this, and give a run-through how you were able to format the different pitches for each player?
You can access the script and data in the repo. The script is stored in a jupyter notebook. The code is written so that you can run each block separately and follow along with the tutorial without having to make adjustments. It is a great starting point for those that are new to learning python.
Y: In order to recreate this we will be using the mplsoccer library which is a great place to dive straight into visualizing your data. First step is to install and load packages, we will use pandas and numpy for basic data analysis and mplsoccer to plot it onto the pitch. Highlight-text is another package that allows you to customize your annotations and headings very easily.
Next step would be to do some manipulation on our data to obtain deep completions. The basic idea for this is to filter out our dataset for all the passes and carries that end outside of the highlighted region.
This can be done using Euclidean distance formula to calculate the distance of the end points of our passes and carries from the center of the goal post (this is also the center of our highlighted region)
The reason behind this, is the football pitch in Statsbomb is measured as 120 x 80.
Setting up the pitch and plotting on it is made so much easier with the help of mplsoccer here. They have predefined functions for making pitches for different data providers, plotting lines and scatter.
Adding a scatter point at the end of the passing line is a stylistic choice, you can change this and go with an arrow or without anything as well.
Be sure to check out the full walkthrough code to create your own!
CJ: Why is this type of analysis important? What insights can we find from this on their individuals style of play?
Y: Football at its very basic is about scoring goals and maximizing your team’s chances of scoring goals and to that you have to get the ball in areas that aid in that. Looking at deep completions helps us in that. The players highlighted in the chart are the ones that help do that via their passing or ball carrying. Looking deeper at it we can further break it down to see what mode do players tend to use when doing that i.e. passing, carrying or crossing and that can help us understand the playstyle of the players a little bit better.
CJ: I love how you have made some great design tweaks for this visualisation on Lauren Hemp & Chloe Kelly, could you share tips around this? (LINK)
Y: Yeah the attacking duo of Hemp & Kelly is one of my favorite duos in women’s football and I think it’s one of the most dangerous duos as well. After having watched both of them all season I just wanted to highlight a couple of things both excelled at.
In order to create this viz the primary thing was the definition of a cutback. Luckily for us Statsbomb dataset already categorizes passes into cutbacks for us but if the dataset you are using doesn’t do so, you’ll have to do a little bit of math and filter the dataset according to the pitch coordinates of the data provider.
One design element I added here is the inclusion of a picture of Hemp and Kelly. This only takes about 3 lines of code to do and if done right makes your viz look cool I feel.
CJ: What were some of the challenges faced when creating this visualisation, and tips you would give those just starting out with python and Statsbomb data?
Y: Primary challenge was in assembling various elements of the viz and presenting them in a manner that isn’t overwhelming for the reader.
My tip for those starting out would be; to be consistent in their approach and read a lot of others work, to learn and get inspired. It’s all cliche advice probably but don’t be afraid to ask questions that you might think are “stupid”. It’ll help you learn and understand concepts a tad bit more. One of my biggest tips for anyone coding is to learn and read the documentations to gain a better understanding. Finally, make Stackoverflow your best friend – because you’ll probably be spending a lot of time there!
After becoming familiar with the workbook.
Why not try and follow the tutorial to complete a grid of 3×3?
Try recreating the chart for a different player
Try recreating the chart in Tableau using the original dataset
CJ Round Up:
Wow. I’m blown away by the efforts Yash went to for this guest blog. I’ve certainly learnt a lot and I hope you have too! The football community in general is huge, in both size and talent. For me, the passion in Yash’s guest blog really comes across through uniting both love of the game, interpretation of the game, and with some pretty fire coding skills.
From a personal perspective I would like to host an environment that causes greater blend of tools. Through introducing more analysts to the visualisation side (namely Tableau) where they have pre-existing sport knowledge or coding skills. Yash’s blog has set the level of standard for that.