On this months episode of pretending to know how to code, I’m delighted to do a run through of a cricket package. The blog has the aim of extracting multiple different datasets from a package as well as explore different types of matches available.
What I really want people to get from the blog is the small reminder of different ways of collating your own data. It doesn’t always have to be copy and pasted, nor pre-canned from a website, but there are ways of accessing it through different repositories and packages too.
PYTHON CRICKET SCRAPER
To install, you will need the following command in the terminal.
pip install python-cricket-scraper==0.1.2
So where to start?
so python-cricket-scraper is built to get cricket data from Cricsheet and ESPNCricInfo. Makes sense to take a look at Cricsheet website. At the moment Cricsheet have ball-by-ball information for 11,685 matches. What we will look to do is navigate how to find one specific match and some of the surrounding details around that game.
Wow! Look at all the choices available.
For now lets navigate down to the super smash.
Within this data we have 203 matches stored in JSON format.
We can directly look at these JSON files..
But we need a way to understand what’s in the files.
If we open the read me file we find even more information.
So let’s take this first row as an example and as a reference point you can cross compare our data to the results found on this website. This means when we write our code we know if what we are extracting is similar to the actual results we find online from a secondary reliable source.
If we go back to our original Json folder we can search for this specific game.
When we open that file as a text file we may feel a little lost, but it’s a good way of seeing how all the information is stored.
Time for us to utilise python.
Let’s break it down section by section. First we know what match we are looking for as we just found it in the readme files. The match number is 1289634.
We can also double check the date of this game.
We set up an excel sheet using the excel writer as we want to export multiple tables into one excel file.
First we look at the overall team. When we run our code this comes out as the first tab.
This is the easy one!
The next few are a little more complex just because of way the data is structured. There are multiple tables of our data listed within the summary sheet and scorecard so we have to say which one to look at.
Once you’ve run the code you’ll see.
Match Summary – is the top 4 batsman
Match Summary (1) – is the top 4 bowlers
Match Scorecard (1) – Looks at all the batsman for both teams
Match Scorecard (2) – Looks at all the bowlers for both teams
Match Scorecard (3) – Looks at the overall summary of the game H2H.
You can find the dataset here, along with code.
Remember again, we can make sure our code is correct by comparing it with external third party sites.
So there we have it, a basic introduction to extracting cricket data from a python package with little to no prep.
Here is a small table I’ve created just to summarise the batting performance of the match. It’s downloadable here.
- Try accessing the results of an entire tournament.
- Try accessing some other parts of the package, i.e best performers of match, partnerships, wicket details and venue.
- Try prepping the data ready for Tableau after you extract it.
- Try building a cricket scorecard in Tableau.