Hi all,

Something new this week that I don’t think individuals will have come across too frequently. It is the chart type called Tennis Game Tree’s.

Game Tree’s aren’t specific to Tennis, in fact you can read more about them here.

What Is A Tennis Game Tree?

“Game Tree” is a depiction of Point Progression for a selection of games within a tennis match or across a series of tennis matches; it is a Sankey Diagram and possesses the “Markov property”, meaning that the set of future “states” that are possible are constrained by the current “state”, the point score at any moment in a game.

What that means in laymen terms is each point in tennis progresses you towards winning the game, i.e its 0-0 so the next point can only be 15-0 or 0-15… depending on that point winner determines what the next potential outcome of points can be, until eventually someone wins the game.

Taking the above example of the Womens Wimbledon Final, we can highlight the progression of points for each player for when they are the server. It becomes increasingly interesting when you dive into the stories behind why the player may have been able to break serve / or had a good service match when looking at the spread of thicker lines. For example Carlos losing many of his first service points but yet didn’t lose many service game points overall (when you reach the bottom of the chart)

So how do we go about creating this chart?

Well there is a workflow attached in the repository, but the good thing is you should be able to amend the flow for any game that has been charted within Jeff Sackman’s repository!

Let’s talk through the data.

The original tennis data can be found here.

You can find a copy of the files within the Github Repository at the top of the page, also found here.

The folder contains copys of the most recent Wimbledon 2023 matches, and points data that we look to transform.

Alteryx Transformations

I’ve added the Alteryx transformations but whether you want to replicate this in another tool lets take a look at the transformations required.

The match data – We connect to this and just select the non null fields, and amend the match number to be an integer.

Next I create a match field, this is a simple concatenation, and then find matches for Carlos – This can be amended based on whichever player you are looking for. Finally I use the summarise tool to find the max match ID for Carlos (naturally, the ID’s increase throughout the tournament so the maximum one would relate to the final!)

None of this bit of the flow is that important as you could manually look up which match ID you need.

The next part of the flow comes from the points data.

The matchID is the last four digits attached to the matchID column, we will need these to join together to reduce the file to just the final. The select tool just picks columns that are of use to me such as point, server, score etc. The filter looks to remove the first two records of data from the file where the match hasn’t started.

By joining these files together I have all the points of the final between Novak & Carlos.

Here is what that clean looks like so far.

The part of the flow is by far one of the most important parts. We utilise the multi row formula.

What the multi row formula allows us to do is find the players point for the next point. Notice in the data screen shot below we now have a field called P1 Next Score P2 Next Score.

The expression for this is:

[Row+1:P2Score]

Why do we need that?

Well to create our decision tree / Game tree, we need to know from each point where it goes to out of the different possibilities. It is best to start treating our points as start and end nodes.

Now I have that sorted, the final thing to do is to create the ‘Tree Mapping’ data.

I have created every output possible from the scores and allocated them the required X & Y co-ordinates for our tree. Normally i’d be against this manual preparation but the tree never changes shape so our X & Y co-ordinates can be stationary. It is important to note that origin x and y values of 0 start at the top, but end values of 0 (i.e when the player has won the game) are seen at the bottom of the tree.

We look to join this data in, and then we can do our final summary to be able to prep our data ready for Tableau.

The most important part of this is the summarise tool because we want the thickness of our lines to represent the number of times that movement from one point to another happened.

Here is what that final example data looks like, ready for exporting to CSV. You can find a copy of this final prepped dataset in the Github Repo.

Tableau Build

Once you’ve run your flow for your desired match, the only thing you will have to do is connect to your data and re-union it with itself, replacing the old data.

However, below outlines how to build it from scratch just in case!

The first thing is to union your data file, this allows for us to have a start and end point.

The first two calculations separate our X and Y co ordinates.

GT. X

if [Table Name] = ‘Game_Tree.csv’
then [Origin X]
ELSE [End X]
END

&

GT. Y

if [Table Name] = ‘Game_Tree.csv’
then [Origin Y]
ELSE [End Y]
END

We want two layers to our chart, one is our lines, the other is our circles. Lets create them as makepoints.

MP. Line

MAKEPOINT([GT. Y], [GT. X])

&

MP. Circle

MAKEPOINT([GT. Y], [GT. X])

Double click MP Line, and then print server onto columns, as a discrete dimension.

Change the mark to a line.

Add Record ID, Point Server onto detail as continuous dimensions.

Add Count Distinct Point Number onto Size to showcase the frequency of that tree path.

Next we want to add our MP Circle onto the pane.

At this point we can then turn off our map background. Map -> Background -> None.

For this layer we want to make the mark a circle.

Add Origin X, Origin Y, End X and End Y onto detail. Make them dimensions.

This is the final chart in terms of how it is shaped. The final few touches help cosmetically to elevate the viz.

Create the following Label calculations and bring them onto label.

Label X

if [Table Name] = ‘Game_Tree.csv’
then [P1 Score]
END

Label Y

if [Table Name] = ‘Game_Tree.csv’
then [P2 Score]
END

Create a colour calculation to split the circles colours based on player.

if [Table Name] = ‘Game_Tree.csv’
then
sign([Origin X])
end

Amend as appropriate.

There we have it, our Tennis Game Tree.

GOING FURTHER

• Why not run the workflow against a match of your choosing?
• Why not add some supplementary metrics to this chart and make it into a dashboard?
• Why not recreate the alteryx flow in a transformation tool of your choosing.

Any questions, just shoot me a message.

LOGGING OFF,

CJ