Hi all,

Been on a bit of a soccer streak recently blog wise. Hoping to continue that idea with this blog.

Today’s blog will look to create ‘Shot Zones’ using an Alteryx workflow and the output visualised in Tableau.

You can see what we will end up with below.

Where did this idea originate?

Well I came across this paper looking at flanks and out of the box shots and started digging a bit further. This lead me to this shot matrix created using Opta data.

This concept isn’t that new, in fact I came across this beautiful visual by sonofacorner below.

Which brings us onto how to create it,

Realistically to get to our finished point we need to follow the steps below:

  • Figure out the X & Y locations of the pitch and the shots data
  • Draw a path that creates polygons for the different shot boxes
  • Find out what zone each shot is in
  • Visualise that data on a pitch

Seems simple enough?

First thing we do is figure out the shape of our zones. I got the dimensions of a pitch from the mplsoccer documentation.

Then I decide what the zones would be split into. You will see that it contains 6 zones.

The beauty about the workflow is you will only need to amend the input data for the shot locations at the start to be able to change how the output looks. In this case i’ve broken the pitch into 6 zones. Zone 1 is in front of the goal. Zone 2 is in the 6 yard box but not in line with the goal, Zone three backfills to the top of the box. Zone 4 either side fills the remainder of the box. Zone 5 is within the flanks, and Zone 6 is anything behind the box. Perhaps if you were to take this further you may split zone 6 into a 7th zone if you want to emphasis shots taken just outside of the box. Given the above diagram, this can easily be done!

Let’s look at the Alteryx Workflow.

We connect to our EPL shot location data and choose a specific match, in this case I’ve chosen the match between Manchester United and Chelsea here. 

The top part of the flow is used to split the home and away shots in two to mirror the above. The second formula tool just scales up our values to be able to fit on the pitch. The really important part here is the spatial tool of create points where we can start to see our points mapped out.

The bottom part of the flow

So the input data here is all the pitch points (Where Include in Zone = N)  as well as our mapped out zones (Where Include in Zone = Y)

We then split these apart (as we will later want to say which of the zones the shot is taken in)

The formula tool scales our pitch to the same amount as our shot data.

The flow mirrors each other, the top section building the ‘zones’ the bottom part drawing our pitch outlines.

Create points and build spatial tools help us build our pitch and zones.

Here is what our zones output looks like:

and here is our basic pitch drawing:

Both being visualised using the report map tool.

The next step is to find which zone the shot intesects so we can use the spatial match tool.

Side note, This may be my new favourite Alteryx tool!!

The final bit of the flow looks to summarise what zone our points in. This is at an aggregated level.

I group this on team, so we can split our zones out for Manchester United and Chelsea

To make life easier in Tableau later i do a bit of a funky full outer join to be able to have row level shot data but also then our aggregated pitch polygon data.

You can see how I’ve offset the stacking of the data. Do note also I’ve brought in all the pitch details too regardless if a shot wasn’t taken in the specific zone or not

We can create makepoints of our x and y co-ordinates of the shots to plot the different teams. You will see they resemble the chosen match from Understat.

We can then use our pitch X & Y co ordinates we’ve created to draw the outline of the pitch. Make sure the SN (Sequence Number) goes onto the path, and you have the correct pills on the details card from the screenshot.

The final layer we want to add is our Zones Layer – Here we write an IF statement to look at the original data that is our zone polygons, we then plot these points in a similar way, again, with SN on path and the required dimensions on detail.

By adding our aggregated count to colour we can see which zones were most frequently shot from.

And that’s it!

The toughest part to this was really creating the initial dataset for the zones but once that’s done it is fairly smooth sailing.

GOING FURTHER:

  • Why not change the zones you look at?
  • Try map all of one teams games as a small multiple
  • Try adding more layers with details around the score, team and xG
  • Try other dimensions on colour – perhaps xG from each zone may be interesting?

Message me on Linkedin or Twitter if you have any questions – as always the resources can be found on Github!

Take care &

LOGGING OFF,

CJ