Hi all,
Its a great pleasure to invite good friend and previous colleague Darragh Murray onto the blog today. Darragh has really impressed me over the years from his T shape skills – the AFL blog he shares with us today reflects that. If you’d like to connect with Darragh he can be found on Twitter and his own site here.
Pursuit of the Pennant | Using R and fitzRoy to visualise historical AFL ladder standings.
Last year, I was fortunate to be selected as one of the top 15 entrants in the 2023 edition of Tableau’s Iron Viz competition.
My visualisation of the evolution of women’s participation in Australian Rules Football relied on historical Australian Rules football statistics obtained via a handy API made available through an R package called fitzRoy written and maintained by James Day, and co-authored with Robert Nguyen and Oscar Lane.
In this tutorial, I want to demonstrate how one can access this API via R, obtain data to gain a historical understanding of the finishing positions of various Australia Rules football teams over time and visualise this information in a platform such as Tableau Public.
So remind me, what is Australian Rules Football?
Australian Rules Football (often called Aussie Rules or ‘AFL’ or simply ‘footy’) is a full contact ball sport played on a large oval field between two contending teams of 18 players.
Footy is a very popular sport in Australia, with some regular season matches hosting up to 100,000 spectators depending on the teams playing. Here’s a short video that explains the game concept, complete with flashy highlights. Each year 18 AFL teams compete for the AFL ‘flag’ or premiership. This involves playing up to 23 rounds of regular season matches, followed by a knockout final series culminating in the AFL Grand Final – usually held in late September at the Melbourne Cricket Ground.
‘Footy’ has a long history in Australia and is one of the world’s oldest organised team sports, tracing its origins back to the mid-19th century.
Fortunately for us data nerds, the footy fans of the time kept highly detailed records, and this tradition has continued well into the 21st century. And thanks to the fitzRoy API, getting comprehensive game and player data is highly straightforward.
What exactly is in fitzRoy?
“The goal of fitzRoy is to make it easy to access data from the AFLM and AFLW competitions. It provides a simple and consistent API to access data such as match results, fixtures and player statistics from multiple data sources.”
The fitzRoy API provides access to several data sources that compile statistics on both the men’s and women’s Australian rules competition. Data available include:
• Fixturing and match results.
• Player lineups and statistics.
• Team ladder positions by round.
• ..and a whole lot more.
Essentially fitzRoy provides people access to:
• Official data from the official AFL website.
• AFL Tables, which contains match results dating back to 1897.
• Footywire, which has detailed player statistics from 2012 onwards.
• Squiggle, home to the best AFL predictive models.
• Fryzigg, a further source of player statistics.
I feel like I’m regurgitating much of the useful information in the ‘Introduction to fitzRoy‘ vignette, but my point here is that fitzRoy provides users access to various resources with varying levels of detail. Older data will have fewer details, whereas more recent seasons and matches will have more detailed data for users to manipulate and shape.
Visualising Ladder Positions: Showing AFL Team Performance Over Time.
Inspired by a Tableau Public visualisation completed by my friend Kris Curtis, I wanted to quickly and easily build a dataset that allowed me to visualise the historical finishing position of each Australian Rules football team since the formation of the national competition in 1990 up until the end of 2022.
Manual data collection would be relatively straightforward, and I imagine someone could quickly assemble a dataset within an hour or two. Still, I wanted to show how easy it is to complete with a few lines of R-code using fitzRoy.
I’ll use R and run code for this walkthrough using the defacto standard R IDE, RStudio. I won’t cover the step-by-step in installing these platforms, but if you’re stuck, here is an excellent rundown on how to do so.
The next part of this blog will run you through the logic of getting all the footy data required to build a visualisation similar to the above using the fitzRoy API in a step-by-step fashion. However, if you’d prefer just to see the commented code, you can do that by looking at it on github.
Setting Up Your Environment: R Packages
The first task one needs to do after installing R (and likely an IDE like the aforementioned RDE Studio) is to make sure you have the appropriate packages installed in your R environment. Fortunately, this is dead easy to do:
install.packages("fitzRoy")
install.packages("dplyr")
If you don’t have the above packages installed, this code is what you need to get sorted. Calling these in the script will install them. Once you’ve run them once and installed them, you can comment these lines out using a hashtag (#).
We then load a package into our R session by calling the library function.
library("fitzRoy")
library("dplyr")
Our basic environment is now set, so we can get cracking. The first step is creating some empty data frames to store our afl season data.
all_afl_season = data.frame() # a dataframe to store regular season data
all_afl_grandfinal_results = data.frame() # a dataframe to store the results of the AFL grandfinal
We’re using empty data frames because we will create a loop and append new data to these data frames each iteration. We’re starting with the 1990 season and continuing year by year, adding new data until we hit 2022.
The data we get for each AFL team in each season is their final ladder position. However, unlike sports like premier league football, AFL teams play in a final series with a top-eight ladder position stamping your ticket.
Theoretically, anyone in the final eight can win the premiership, though the higher your position gives you certain advantages (home ground finals and so forth). However, the team that finishes top of the ladder often fails to win the premiership.
Here’s the entire loop code with comments to guide you on what we’re doing. You can download it from the GitHub Link at the top of the page.
Our previously two empty data frames are now filled with the good stuff! Each team’s finishes positions plus the data on who won the flag that year! Here’s what some sample data looks like for each data frame.
The final preparation step is merging these two data frames into a singular structure that contains both pieces of information in one easy-to-visualise dataset. This is quite straightforward using the left_join function.
afl_ladder_with_flags <- left_join(all_afl_season, all_afl_grandfinal_results, by='season') %>%
mutate(premiership_flag = ifelse(team == winner, TRUE, FALSE)) %>% # use true false for winner variable rather than team name
select(-winner) # do not need this field any more
And you can then see the final ladder positions with an indicator to show which team won the premiership (or ‘flag’) in each season.
We can now export a .csv file, and that’s the hard work all done!
write.csv(afl_ladder_with_flags, "output/afl_ladder_with_flags.csv", row.names=FALSE)
Visualising AFL Ladders & Flags in Tableau
Now that you have nicely prepared dataset, you can visualise this information in a number of different platforms. I’m going to throw together some basic instructions in Tableau, but feel free to use other platforms should you choose.
For this part of the tutorial, I’m going to assume some basic familiarity with Tableau. I’m also going to use the clever transparent shapes trick to visualise premiership flags, but I’m not going to cover how to use them in any depth (for that, I recommend reading Kevin Flerlage’s blog “14 Use Cases for Transparent Shapes and Images’).
Open up your install of Tableau, either Desktop or Public and then use the ‘connect to data’ option to locate the .csv file you outputted from RStudio.
We will build a series of simple line charts – one for each team, showing the ladder position for each season between 1990 and 2022.
If a team wins the premiership, that season will have a right-facing triangle to indicate that in the space above the relevant season. As previously mentioned, the ‘flag’ is what footy fans colloquially call the premiership, and coincidently the right-facing triangle does look like a flag).
Open a Tableau Worksheet and drag ‘season’ to columns and ladder finish to rows. Right-click ‘ladder finish’ and make it continuous. Change the Marks type to ‘line’. Then set the path marks ‘line type’ to ‘step’.
Drag ‘team’ to the filters and select a team (for example ‘Brisbane Lions’). You should now have a worksheet that looks something like the following:
Hold up! Things look a bit odd. Ladder position is in the opposite direction to where it suppose to be, so we need to reverse the Y axis so that teams that finish in top spot are at the top of the graphic. This is easily done by editing the axis parameters of the ‘ladder finish’ pill.
Once you’ve reverse the axis, the next step is to add in the flag symbols above the relevant season. We’re going to need a Tableau calculation here called ‘Flag Position’. Here’s the calculation:
Flag Position
IF [Premiership Flag] = TRUE THEN 0.3 ELSE 19 END
This calculation is just telling Tableau where to place a ‘flag’ on the Y axis depending on the value of the ‘Premiership Flag’ variable.
Drag the ‘Flag Position’ calculation to rows and make it a dual axis. Synchronise the axis.
You’ll now note that your view will look something like the following:
Tableau has by default set the ‘Flag Position’ measure to line. You’ll note that where Brisbane won the premiership between 2001 and 2003, it has set the ‘Flag Position’ value at 0.3. We’re going to change this to our nice little premiership flag icons!
Click on the Flag Position mark, and change the mark type to shape.
Then drag the ‘Premiership Flag’ dimension to the shape mark, then click on the Shape mark to edit the shape. We’re going to set the ‘true’ value to be a filled right facing arrow, and the ‘false’ value to be a transparent shape (remember: read that blog!).
You can also set the colours of the flag by using a similar approach except dragging the ‘Premiership Flag’ dimension to colour rather than shape. A tiny bit more formatting (removing headers and duplicate axis etc.) and you can have something like this:
Of course, you can iterate this approach for each team if you wish and create a fully formatting dashboard that shows each team individually, which is what Kris did in his original dashboard. Here’s what I ended up producing:
For this layout, I essentially built the visualisation for one team and then duplicated that visual for all teams, chaning the team filter as I went. I then plonked that into a 4×5 grid of containers on a dashboard.
You’re more than welcome to download the dashboard and check out the exact mechanics of how it works.
BONUS: Visualising AFL Ladders & Flags in R
I know I showed you how to show this in Tableau, but R has a handy visualisation engine called ggplot2(). I’m not making the same dashboard as I did in Tableau, but I’ll throw in more code you can add to your script, achieving a vaguely similar outcome to what we did in Tableau but via a few lines of R code!
Return back to the R environment we had set up earlier. We’re going to load a few more packages (install them if you haven’t got them already installed!).
library("ggplot2") # ggplot2 allows us to chart data
library("bbplot") # bbplot allows us a bit more functionality to pretty up our graphs BBC style
And it turns out, we can do a lot of the hard graphing work in about 6 lines of R code
# now we can chart using ggplot and facet_wrap by team (small multiples)
chart_seasons <- ggplot(all_afl_season, aes(x=season, y=ladder_finish, colour=team)) +
labs(title="Pursuit of the Pennant", subtitle="AFL team ladder positions | 1990 - 2022") +
geom_step() +
facet_wrap(~team) +
scale_y_reverse() +
bbc_style() +
theme(legend.position = "none")
chart_seasons #output the seasons chart
The last command then outputs to the plots tab within RStudio, and should look something like this:
While it’s not totally the same as the Tableau dashboard above, it shares similar characteristics. With a bit more code, one could replicate that Tableau graphic pretty closely, but I’m going to leave it at that for today.
R’s visualisation engine is potent, and there are many great guides on how to build pretty graphs using ggplot2() and Tidyverse() – and my favourite is this guide to beautiful plotting by Cedric Scherer, so if you’re interested in ggplot2(), you should definitely check that link out.
Anyway, thanks for reading all the way through. I hope it’s been useful and that you’ve learned something about the power of both fitzRoy and #rstats!
CJ Round-Up:
Thank you Darragh for sharing both the visual in Tableau and the bonus version in R Studio. What a fantastic way of retrieving, cleaning and visualising AFL data. As always the code has been stored on Github, you can find it at the top link under the title, as well as a link to Darraghs visual on his page.
LOGGING OFF
CJ