Hi all,
This blog is the first of a two part series of guest blogs, with a particular focus in this blog on how to get football stats from API-Football using Alteryx!
I am delighted to welcome Alberto Oraa onto the blog to demonstrate how you can access premier league data. Please find a little bit more about Alberto and his love for data and sports below:
“There are not many things I enjoy more than watching sports. However, I soon realized the beauty of analyzing everything related to what came before the event, the event itself and whatever comes after. Analyzing it, I believe, gives you a better perspective of what happened and how future events can change by looking at patterns and different scenarios.
Many times I have encountered a conversation related to using data and if it really worth the effort and resources while using it for sports and every single time I got to the same conclusion: “If there is an opportunity to have another point of view, you should definitely take it because that is going to show missing details and even the smallest detail can make a difference”.
I would have never been able to create any analysis if I did not have the chance to meet both the data and the sports community and I must say both work incredibly well together. That’s absolutely what I have enjoyed the most.“
Onto the blog itself!
How to get football stats from API-Football using Alteryx
If you have ever tried to use sports data for your analysis, you may have encountered how difficult it is to find data. If it is free that is because it might be quite old and if is up to date it will probably come at a cost not everyone can afford. But there is always a way to find up to date data for free and that is exactly what today’s blog is about. Once data is available, Joris will share with all of us a great example of how that could be visualized into a tool like Tableau.
What is API-Football?
API-Football is a Restful API that covers +915 leagues & cups with livescore, standings, events, line-ups, players, pre-matches odds, statistics and many more. Even though it has an easy integration and works with code libraries in popular languages, we will cover today how it does work with Alteryx.
For those of you have been worked with RapidAPI before, it also works with it and it will allow you to manage from one dashboard, view endpoints, test from the browser, connect using code snippets, code libraries in popular languages, call volume and billing, check errors and latency, take a look at logs for your API calls and secure payment.
API-Football works with 4 different plans:
- Free for 0$ Month at 100 requests/day
- Pro for 19$ Month at 7500 requests/day
- Ultra for 29$ Month at 75000 requests/day
- Mega for 39$ Month for 150000 requests/day
However, we will be using a free plan for this specific blog’s content.
*API-Football might require you to introduce your CC details while signing up. However, while using a free subscription in the specified requests of 100/day no charge is to be made. It is specified in the terms and conditions that can be found here.
Step 1: Create an API-Football Account
First of all, we need to set up our own API-Football account. To do so, we need to access the API-Football pricing section and choose SUBSCRIBE to the Free plan.
It is also possible to sign up and create our account from the API Sports website as you can see in the following image. Either way works and will give us the right access after your account has been created, your X-RapidAPI-Key or your API-Key (this will depend on which subscribing method you choose) will be created and that will be used to call the API to get our data.
Step 2: Check API-Football Documentation
After we have just created our account and log into the API services to get our key, it is time to check which kind of calls we can make and how to make those. That information will be stored in the documentation web-page of the API-Football service which can be accessed here.
It will contains information about which information is required and where to get it as well as the architecture and sample scripts for different types of code.
Furthermore, there will every piece of possible endpoint of data that can be collected from an API call. Some of them may require the use of some parameters.
Here are some examples:
Timezone: https://v3.football.api-sports.io/timezone
Countries: https://v3.football.api-sports.io/countries
Leagues: https://v3.football.api-sports.io/leagues?id=39
Teams: https://v3.football.api-sports.io/teams?league=39&season=2019
Venues: https://v3.football.api-sports.io/venues?id=556
Standings: https://v3.football.api-sports.io/standings?league=39&season=2019
Fixtures: https://v3.football.api-sports.io/rounds?league=39&season=2019
Injuries: https://v3.football.api-sports.io/injuries?team=85&season=2020
Coach’s: https://v3.football.api-sports.io/coachs?team=33
Players: https://v3.football.api-sports.io/players/seasons?player=276
In order to get the information required for the parameters of any specific API query, it will be really important to check the IDs documentation of the API-Football service. It is possible to get to the IDs from the API Sports dashboard where it will be possible to get the League’s & Team’s IDs. For example, 39 is league id for the Premier League while 33 would be the id for Manchester United. However, after being logged in it could be access from this two URLs:
Building An Alteryx Workflow to Get Football Data
Once our API account has been set up and we have collected our Key, it is time to move to Alteryx to build a workflow that allow us to make queries to the API.
If you’d like to download the exported data, or download the Alteryx workflow you can do so in the links at the top of the page.
Both the:
Premier League Workflow
Premier League Example Output
will be stored in the GitHub repo.
Step 1: Assigning Credentials
Our first input into Alteryx must contain the URL of the query and our Key. It would be a Text input tool the one we can use in order to set one row and two columns (one that contains a URL query such as “https://v3.football.api-sports.io/standings?league=39&season2021” and another one for the key).
Step 2: Downloading the data
After these credentials have been brought into Alteryx, then it is required to make the API call. There is one specific tool to make it which is the Download Tool and it will require us to specify which is the URL of the query. It will also require us to select which field contains the key as that will be selected in the Headers tab of the Download Tool configuration.
Steps 1 & 2 would then look like this:
Step 3: Performing data cleansing on the data
This, if successful with a code such as HTTP/1.1 200 OK, will download data in a JSON format and that will require the use of the JSON Parse tool set to parse a JSON Field called DownloadData.
Once it has been parsed, data is structured in rows containing data which would need to cross tab into columns in order to get data ready to be analyzed later in any other tool. Before we do so, it is important to understand that there is a need to perform data cleansing that will involve a Sample Tool, a Tile Tool, a Filter Tool and a RegEx Tool:
- Sample Tool: First 13 rows (always!) contain data related to the league or competition as well as the season even though that is information that we are already aware of as we needed to include in the URL used to make the query. That is why there is a need to use a Sample Tool to skip the first 13 rows.
- Tile Tool: As our query contains data related to the standings for a specific league and season, then it will contain as many records as information it returns for every team of that specific league and season. For example, if we are downloading the data of the Premier League 2021/2022, we must consider that we will have 20 teams so we should use a Tile Method of Equal Records for a total of 20 tiles (which it really means teams for this use case). However, we must consider the number of teams in Tile Tool based on which competition we are querying its data.
- Filter Tool (optional): If there are any records coming out of the Tile Tool that we believe are not interesting to our main purpose, a Filter Tool can be used to filter out based on the Tile_SequenceNum because that way it is possible to remove it from every team in that specific league and season.
- RegEx Tool: Using RegEx here can be useful in order to rename those values that will serve as column names (or headers) for our final dataset. Once our download data is parsed using a JSON Parse Tool, the name for each will value will be under a colunm called JSON Name. That will contain some characters such as response.0.league.standings.0.0.team.id but we are interested only in the two final words of that name. That way we can use a RegEx to parse just that part of the name:
(\l{8}).{1}(\d{1}).{1}(\l{6}).{1}(\l{9}).{1}(\d{1}).{1}(\d+).(.+)
Finally, we could then move to use a Cross Tab Tool to pivot the table so based on the identification of each team (Tile_Num coming from the Tile Tool).
After all these steps, data cleansing would be completed.
Step 4: Sorting and renaming columns to build our final table
It is now almost over!
In order to just have data that we can finally output to any kind of format (XLSX, CSV, HYPER, Google Sheets etc.), a Select Tool can be used to sort and rename the data so we can end up with a table that it looks like this. It is highly important to take into consideration the Google Sheets output as this one will allow users to update automatically their data later on Tableau Public by following Joris’ blog! However, it is really important to understand that this Tool is not automatically installed once Alteryx Designer has been installed but you will need to install it from the Alteryx Community once download from here and install the YXI right after. You can follow this instructions from the Alteryx Help page to make sure you can authenticate and write something from Alteryx in a Google Sheets.
After a query just like this one, we would be able to get information related to the competition, a team name, id & logo, results due to its ranking (UCL,UEL, Relegation), status compared to previous week, games played (home & away), games won (home & away), games lost (home & away), games drawn (home & away), goals scored (home & away), goals against (home & away) and the latest date for a game.
Once it is ready, we can just bring in an Output Data Tool to get our data and start our analysis by following Joris’ steps in Tableau.
PS. It may require a few more steps through the data cleansing for different types of queries but the data obtained always comes with the same structure.
CJ Round-up:
Thank you Alberto for that insight as to how to prepare data from the Football API ready for analysis in Tableau. I love that the workflow on the most part is plug and play. Later this week we will hear from Joris, in part two as to how to calculate top teams, players and games as well as a few formatting tips and tricks to bring the visual to life with badges and players.
Once again, be sure to download your own copy of the workflow and or example dataset if you’re wanting to take part in the Tableau section. You can find the files within the repo.
LOGGING OFF,
CJ