Welcome to the May edition of “What’s Good?”.
Each month will have a tailored theme.
I am so pleased to invite Mckay Johns to the blog for the May edition of “What’s Good?” This month’s topic is slightly different to the usual Tableau content. It will be on python and sports! I’ll add in a few Tableau gems in a follow up blog, next week.
Mckay has been driving the sports community with an awesome array of youtube tutorials. From starting 10 months ago on Youtube, he now has over 1K subscribers and tens of thousands of views where people get to access run-throughs of different python projects from scraping understat data, to looking at clustering models, to creating complex chart types within python. Alongside this he has built an impressive Git repo where others can access and run the code for themselves. Today we will be learning a little more about Mckay, and one of his python step by step guides.
You can follow Mckay on Twitter, here.
CJ: Mckay, Thanks so much for being a part of this month’s edition. For those who are unaware of your background. Tell us how you got into data, and utilising python? Is your background in data?
M: My background in data really starts from when I was younger and really started to enjoy math. I’ve always enjoyed math growing up even in elementary school and high school and I usually found myself doing better at math than I did in any subject. Growing up and loving sports, I used to look forward to getting the newspaper delivered each morning so I could sit and read the stats of the different baseball and basketball teams and who was scoring the most, league leaders, different stats about different teams, etc. and that would translate over to things such as fantasy football, watching and learning about new statistics, and so forth.
When i got into college I actually didn’t know what I wanted to do, I thought I wanted to be a sports psychologist and then I took one psychology class and found out it wasn’t for me, so I moved over to the information systems department and learned about all the cool things people were doing with data and math and wanted to eventually be able to apply that sports as well.
The main problem I found was that I had never coded in my life so I signed up to take an intro to computer science class which taught C++ and I hated it haha. I like the concepts though and the things you could do with it so after some research I found python and took a couple of online courses to learn it.
Right now I am working to finish my Master’s degree in Data Analytics which is how I started to get into things such as machine learning and more advanced python materials.
CJ: What sparked the interest in doing python tutorials? What has been your favourite to produce?
M: I started doing the tutorials after I tried to start learning analysis for soccer myself. I saw a ton of people that were creating some cool things on Twitter, on blogs, and other platforms and wanted to learn how to do it. Since there really was no sort of information, I wanted to see if I could bridge that gap and help others get started. Most of my tutorials come as a I learn something then I try to teach it in a video.
My favorite one so far was probably the tutorial I did about KMeans clustering which is a way to group identical records together to try and find hidden trends or information. I really like the method and I’ve seen a lot of people create some amazing things by watching the video so it’s one I’m proud of.
CJ: I see you have created a thread of places to start learning python. What tips would you give to someone who is on the python learning path?
(You can find the thread, here.)
M: I think the biggest thing about learning python (or really just anything in general) is being patient and consistent. When I first started learning to code, I started with C++ which I did not like, and then I kind of stopped learning because it was a little too frustrating as well as I thought i should be an expert in 2 weeks. I regret not trying to stay consistent because the compounding effects of consistency would have payed off in the long term and I could possibly be even further along my path than I am now.
One thing I always try to tell people who are learning python is to start with the basics, be consistent in practicing or doing tutorials every day, and then start building projects you are interested in that are going to help you learn even more. Learning a new skill is pretty hard but consistency and patience is how i’ve seen many people go from not knowing anything to being able to create some awesome things.
CJ: So, why football? (soccer)
M: I actually only played soccer until I was about 8 or 9 but living in the United States, all of my friends played baseball and football so I didn’t play soccer growing up. I got into soccer because I started making some friends that played FIFA like 02 or 03 so I would play that for hours with them. I couldn’t watch many games because it wasn’t as accessible but I would watch highlights and on Sportscenter there would be plenty of things so I kept up with it.
I also ended up living in Argentina for 2 years after I graduated high school, so being in a country that lives and breathes football is a completely different experience. I really felt a connection to a lot of the people I met as they shared their stories about their football fanatics and I fell even more in love with the sport.
I like other sports too but football is the one that I find the most enjoyable to play and watch.
CJ: In a recent video you discuss 5 reasons why someone should learn tableau, covering off its demand, ease of use and its applications. What part of learning Tableau have you enjoyed the most?
M: Tableau I think solves a really big issue with people not knowing how to code and needing to still make charts and graphs. I’ve loved how easy it is to integrate any data sources. This can be a little confusing with programming but Tableau is 10x easier with that. As well as the drag and drop functionality. The fact that I can just drag some measures and dimensions into spots that end up creating graphs that are aesthetically pleasing is amazing. It bridges those gaps and makes it a lot easier and accessible to many.
CJ: What in the data / sports community have you seen recently that has really impressed you?
M: Piotr has developed an insane model to be able to evaluate coaches based on a lot of different variables. It is probably one of the most high end things that I have witnessed and been able to see developed recently.
There’s so many things I could list here haha but I think that one tops them all so I will leave it at that. There were a lot of cool things done in the NFL Big Data Bowl which had people using tracking data and advanced metrics to evaluate defensive performances on passing plays which was really cool. I didn’t see all of the different submissions but the ones I did see were incredible.
CJ: Awesome thanks Mckay. I’ve slowly realised there is a whole new world of sports analysis out their especially when it comes to data analytics/science, particularly in relation to soccer data.
I’d like to take the opportunity here to mention , (with no surprise to some) that #SportsVizSunday has been one of my favourite initiatives to take part in. Each week, I’ve seen both such a high standard and array of chart types, designs and stories come from the weekly blogs.
Some of my favourites from the the last month (April 2021) have been:
Dennis shows the various wheelchair marathon winners from 5 famous marathons across the world. He brilliantly shows the course details whilst including the athletes and respective years they won. How impressive to find Heinz Frei has won a total of 26 races, mainly in Berlin. I love the layout and colour choices in Dennis’ viz. This was a submission across SportsVizSunday, ProjectHealthViz and DiversityInData.
Radials are the way to my heart. Fred beautifully presented viz highlights the various sports included in the summer paralympics by year. I love the finer details of this viz including the sport participant vectors. I have become a real fan of where individuals make a ‘showpiece’ chart and then add smaller supplementary visual aids and text around the side for context. Fred does a great job of it here using a lighter grey text and greyscale bar.
Riley does an awesome job of looking at the historic 116 win season for the Seattle Mariners. I particularly like Rileys design in terms of typeface used as well as background shape colours. It makes for a very good read across the page. Riley’s last three vizzes have been super inspiring.
Simon is a real leading contributor to SportsVizSunday, always coming with high standard dashboards. His Lacrosse dashboard particularly blew me away from utilising the layering functionality so well. Small things such as the icons, colouring and typeface can really make big differences and it shows with how well this came together. Both technically and aesthetically fantastic.
You can get involved yourself, here.
CJ: Now, Finally, and I am so excited for this. Can you give us a written python run through of your shot chart tutorial?
M: So this is a tutorial I created and one of the first ones I made to make shot charts for soccer shots. It shows the locations of each shot that was taken in a match and you can use this to evaluate different things such as goals, misses, different clusters of locations, etc. It’s one of the most basic charts that are used in soccer analysis. If you would like to follow the exact code you can download it on my Github, here.
The first part we are going to need to do is to import the necessary python packages.
We will be using, pandas, which allows us to upload and manipulate data, matplotlib.pyplot so we can plot the points, just the overall matplotlib package, as well as two external packages you will need to install onto your computer called highlight text (https://pypi.org/project/highlight-text/) and mplsoccer (https://mplsoccer.readthedocs.io/en/latest/) which we use to plot the pitch.
Therefore this may require you to write the following within your terminal.
pip install mplsoccer pip install highlight-text
The next thing I like to do is set up general use colors such as a text color so we are not having to type a hex code every time. We do this in the next bit of code:
The next part is going to be loading our data which is in a csv. We use pandas to do this with the following code to create a dataframe which is basically just the python version of an excel spreadsheet and we can check the top couple of rows as well by just typing the name of our dataframe.
Make sure your csv and python file are in the same location. This way you don’t have to pass in the specific file path for your csv. If they aren’t in the same folder, then make sure you are passing in the correct file path for your csv.
The next step is going to be creating the pitch and plotting the shots. This is what the code looks like:
But I will break down each section to make sure we are understanding what is going on.
The first section is creating the pitch.
We use a combination of matplotlib and mplsoccer here to create it. The first three lines are actually creating the “canvas” we’ll use and we set the colors we want with the hex code.
The next chunk is we create a variable with mplsoccer called pitch which basically takes a bunch of different arguments to create the pitch we want. This is the one I use, but there is more customization options if you read the documentation. The next line is actually just drawing the pitch on that canvas we created. And the line after that we need because the statsbomb pitch we are using has an inverted y axis compared to the data we are using so we need to make sure they are going the same way (you may or may not need this line when using different data).
The next section is just plotting the shots. We are just going to plot all of the shots the same color, but you could use something called a for loop if you wanted to plot each shot type or outcome a different color.
It’s a simple code. We just use matplotlibs scatter function and pass in the x column and the y column from our dataframe, and then we set the size to 100, the color to a red, and then alpha is making it a little more transparent.
The next section is just annotation of the plot. This is super easy with the highlight_text package as well as other ways. I chose to use this so if I wanted to add colors in the future I could.
Each one of the texts is using an x and y to plot the location of the annotation and the s argument is where we put the string we will be passing.
There is a lot of customization with the highlight_text package so I would recommend looking into that if you have any questions on what else you can do.
When youre all said and done you can save it and your final image should look like this!
Thats the end of the tutorial! If you have any questions about the code, we just started a discord chat you can feel free to join where you can get help with any questions or things about football/soccer or just sports analytics in general.
CJ Round Up:
Thanks for reading this months “What’s Good?” and do let myself or Mckay know how you got on with the tutorial. I am hoping to post my own soccer blog next week, without a doubt, to a much lesser python quality standard so do watch out for that. It will more heavily look at scraping data from Understat and loading it into Tableau for mapping.
Further thanks to Mckay for the run-through. Keep up the fantastic youtube tutorials, they are a great source of learning. I’d like to finish this month by congratulating Mckay also on a new role he took up recently, and wish him the best with the remainder of his masters course.