Welcome to another split blog looking at both web-scraping as well as Tableau.
I constantly meet people in the community of such great and diverse skillsets. Anmol is no exception. If you’re from the soccer community you may recognise this pizza chart creator he put together, as well as some of the awesome tutorials found here. These are some seriously good tutorials to get stuck into irregardless of coding knowledge. I want to thank Anmol for collaborating with me on this blog – he really pulled through on prepping the python code exactly for how I wanted it to export to csv.
I personally really enjoy writing these end to end blogs as it allows individuals to have the choice to complete either segment or all of it!
Like most python tutorials this will be split into two parts. If you want to only complete the code or skip to the Tableau by all means – we have included the example dataset at the top of the page.
We will be looking to recreate the Understat website user interface for a specific match. Here is the Understat website version. Below is what we will look to create within Tableau. You can find the workbook in the link at the top of the page.
PART 1 – PYTHON
The code you can find on Github.
EXPLAINING THE CODE
Below we have put together some of the key components of the code.
What is Beautiful Soup?
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It is a way of navigating, searching and modifying the parse tree. We can use it in relation to the Understat by inspecting the website page and finding the components we want to construct our dataset.
How does the fetching of specific pages work?
By inspecting the page we can see the underlying HTML of how the site is built. Using beautifulsoup we can find each part of the class that we will need eventually export to our csv file.
Where can I put in a match of my own choosing?
You can replace your match url with the one you want within the code.
obj = MakeEventDataset(“https://understat.com/match/16438“)
What will my data look like?
The export once run will look like the below.
What should I be cautious of?
- The player name sometimes will need tidying up if it has special characters in.
- The Event detail if it has the goal scoreline in, you will want to make sure it doesn’t revert to showing a date format. (I.e 1-1 accidentally showing as 1st Jan)
We have tested the code on various matches, but do reach out if you have any questions or spot any areas for review.
RUNNING THE CODE
This code is built so that all you need to do is replace the url with the match you would like and click run.
Do note, for this to run, you will need to pip install the following packages in your terminal:
- pip install BeautifulSoup
- pip install pandas
Before we move onto the build, once again, I’d like to thank Anmol for pulling together the code. He did an exceptional job.
PART 2 – TABLEAU
You can access the example data for the tutorial on Github
How shall I prep my data after we get the export? How does the rank and path work?
Before we start we will need to make a few final amendments to our dataset.
First we will want to add a column called Rank, label the rank in ascending order based on the minute.
(Note: We could use an index function for ranking them in Tableau, but I find this easier to understand visually)
Secondly we will want to duplicate our dataset completely and label the first with a Path as 1 and the duplicate as Path 2. (We could union this in Tableau, but again, for simplicity sake I’ve done it in excel)
The reason for duplicating the dataset will become apparent when we create the base. We in effect plot a mark centrally (where the minute is positioned) that joins outwards to another path mark where the name begins. We connect these points as a thick line.
We have 8 calculations to make. You can copy them from the workbook, but I’ll explain the screenshots as we go so that you understand the method behind the madness.
Explanation: Find the length of name. We want each of our background tiles to account for player names of varying sizes. I’ve added 12 as a buffer to be able to squeeze in the minute and shape file for the events. You can try smaller or larger numbers here.
Explanation: So by duplicating our dataset we can create two points of reference. Taking the first goal for example. We will create a point on the x axis at zero (path =2) and also a point leftwards of this mark that equals the length of our previous calculation. (path = 1)
In simpler terms, Home events span leftwards of x-axis being zero by the length of the players name, and Away events span right of x-axis (zero) by the length of the players name.
- Double click the 1c. MP Base calculation to add it to the sheet.
- Go to Map – Background Maps – None
- Change the Marks to a line, drag rank and path onto detail making sure they are dimension.
Here are some screenshots of that:
Lets now prep the minutes icon.
The minutes are based centrally so we can make the x axis 0. For the Y axis we will plot the Rank value from our dataset.
- Lets go to Map and turn our map back on for the time being.
- I drag the 1d. MP Minute TWICE onto the card. Once because I’ll be creating a nice circular effect to sit behind the minute number. Then again for the actual minute label.
Note: If you are new to Tableau, when I say drag onto the card for layering functionality you have to hover it in the top left hand corner of the sheet. This can be a little confusing to start.
- Let’s turn off the map again and go to edit the new marks.
- Increase the bar size to maximum from 1c MP Base.
- Go to the first of the two marks for 1d. MP Minute and change it to a circle, Resize the circle as appropriate and amend the colour as you like. You will need to drag Path and Rank onto the marks card as dimensions.
For the second MP. Minute layer (the one on top) You will want to change the marks to text, drag Path and rank onto the marks card as dimensions.
Drag Minute onto the label and make a dimension. You will notice that our minutes are ranked in the wrong order. Right click the latitude generated y axis and click reverse.
At this point, this is where we should be at:
Within that, is an if statement. We want to create a shape for the event (substitution, goal etc) but we will want to offset it slightly from the middle as our minutes already take up that space. So we shift all the home shapes slightly to the left, and all the away shapes slightly to the right!
Add the layer onto the sheet.
Make the mark a shape, Drag Rank and Path onto the marks card and make them dimensions.
You may at this point want create custom shapes based on the event. Drag Shape Detail onto the shape as well as colour. Here is an example using the Tableau built in colour and shapes. Feel free to use your own shapes as necessary.
All we have left is to add the name labels.
Lets add this as a final layer.
Drag Rank and Path onto detail like usual, make them dimensions.
Change the marks to Text, and drag name onto the Text mark.
Finally, It’s a case of cosmetics and a little formatting and we’re done!
What were some of the challenges faced when creating this visualisation?
It would have been great to have been able to make this into a template, however with so many elements having sizing requirements it became more troublesome. Fortunately, I think the build along approach can help others understand the components a lot more.
- – Python:
- Try running the code for your own match.
- Try creating the path, union and rank columns within your python code!
- Try writing a loop function to get all the match events for a whole season.
- – Tableau:
- Try designing a different style of events, or using your own custom shapes and colours.
- Try building a dashboard that has the match events, as well as shot details from a previous Understat tutorial written to highlight the whole game!
- Try applying the user case to something outside of soccer, e.g a message board / texting UI.