Hi all,
Hope everyone is doing well. Todays blog is a bit of a mix between technical and thought piece. I’ve previously written about how to create pass networks using alteryx and tableau here.
Please read that one before moving on as i’m going to skip over the bits of data collection and talk mainly about the data processing in the Alteryx workflow.
As a brief refresher, all of todays files can be found in the GitRepo (under the header) We will then transform the data using Alteryx, and finally we will export or data and showcase how it can be used to create a pass network in Tableau. We won’t refresh how to do the Tableau build as the other blog covers it off, and I followed it to make my charts for this week!
For those that want to relive the moment of England beating Germany in the Euros Final in 2022, you can watch it here.
Okay? lets begin!
Once you’ve run the code you will end up with the following outputs for the Womens Euro Final from 2022:
- match_events.csv – All match events associated to the chosen match ID (Where we will get all our pass information from)
- starting.csv – The starting line up for each team
- fig.png – the background football pitch
So with little transformation I thought best to do the rest of the prep in Alteryx.
This is where my workflow has changed from previous as we now want to build the same chart multiple times over.
I wanted to now look at networks during the game given certain time frames. For this example I will look at the first half, second half, and then extra time 1st and second half.
let’s talk through the process of how the Alteryx flow works
The top part of the flow connects to the raw match event data. We end up filtering for all successful passes (where the outcome is blank) and where the type of event is a pass.
I use the select tool just to pick out the columns from the data that are most appropriate and filter this data just to have the England teams information in.
The second input takes our Starting XI and joins it to our main dataset, flagging who was in our original Starting XI. We don’t use this metric, but I think it could be useful in general to have.
So now the top part of the flow has for our match, including a flag for if they were in the starting 11, all the pass match events.
The bottom part of the flow takes our original data but this time looks at when the substitutions happen. Now previously, I ended up just taking all events prior to the first substitution. This isn’t really a practical solution for when a sub happens early on in the game so decided to revisit the idea.
We can cross reference our sub data externally, here.
This time round, we want to look at all the substitutions.
So the problem that now arises is if we are to build network maps based on each period, we would have points for anyone who has been on the pitch, in this case could be up to 16 players for England in the second half.
My solution is to take out the subbed off player from that time period. Its not a full solution. Because lets face it dependant when during that half you’re subbed massively impacts how many times you’re like to touch the ball. Perhaps if I was to do this on minute segmentation I would look at number of passes made and visualise the player of the subbed dependent on impact, but for now the solution works.
Here’s how that looks in alteryx:
This build in python IDE reads the data we’ve transformed.
It then does a for loop to say that if from our subbed off players are equal to the time period and and player name then remove them from the dataset. Do this for all the players and times listed in our subs dataset. I’m sure this probably could have been an iterative macro somehow.
We have to repeat the same process again, but rather than looking at player also look at pass recipient, because of course we want to remove this player from both making the passes or receiving the ball too.
It then exports this information back into our flow ready to use.
The last part of the flow is all about creating average locations for each of the players. So we take their position when on the ball and take an average of those marks.
Finally we glue back in data to do with the recipient in order to know which player is passing to who! We also count how many times each player combination happened.
All goes well, you will end up with an output such as seen in the match_data_output.csv
From here we are ready to build in Tableau.
For Tableau you can refer back to the previous blog here.
As always all the resources are found in the GitRepo as well as the dashboard is downloadable from my Tableau Public profile.
Why not go further?
- Can you create a small multiple of all games leading up to the final?
- Can you add in jersey numbers into the code and alteryx flow?
- Can you change the design of the pitch to be different colours?
- Can you create the same workflow but split by 15 minute time segments?
I still think there is a lot of potential for building into the flow more ideas around frequency of passes, greatest interactions around players, dynamic shaping as well as more accurate represent what players are on the pitch during allotted time frames. Lots of food for thought, but perhaps you can take the workflow and make it better.
Catch you in the next one.
LOGGING OFF,
CJ