Hi all,

So, first blog back since Iron Viz. That was a thing. A really fun thing too. I have so much appreciation for the support of the community, friends, the JLLDataFam, Tableau team and all the event management. Being on stage felt electric, and those moments will last forever. Being in rehearsals for a good chunk of time meant I had the pleasure of getting to know Will, Kimly and our respective sous vizzers on such a deeper level. I can’t wait to see who we will be cheering on next year, the experience of presenting on stage, is worth it alone.

This blog really acts as a violin follow up to the wonderful work of Liam Holland. If you’d like to make a standard violin chart in Tableau, this is where I would start. I wanted to share how it was possible to take this chart one step further to showcase the shift of previous and current metrics on the same chart using a dual axis as well as some thoughts for using Tableau Prep for the VERY FIRST TIME.

I wasn’t sure if I was allowed to share the original dataset, so the one shown throughout this blog I’ve masked all the country names, this will have been a subset of the full original data provided to us for the competition. I feel like i’ve amended it enough as the original dataset didn’t include continent or quadrant in my case, this was something I had prepped. You can find my files in the GitHub Repo at the top of the page.

We’ll break the blog into three sections.

  1. Why Violin charts

  2. How the prep flow worked

  3. Other resources for distribution charts

WHY VIOLIN CHARTS?

Well, just check out my cost benefit analysis above. The violin plot is a beautiful way of displaying range in a dataset as well as probability density of a value. With the violin plot what i wanted to focus on was two things.

  1. The change in shape – Something that you can’t see with using a box-plot. The change in shape shows how African countries are accelerating, whilst at different rates, this acceleration phase looks different to perhaps a more natural progression stage of the other quadrants where countries are reaching a ceiling value of life expectancy.

  2. The upwards shift – Comparing a previous and current value helps show the shift in life expectancy. I wanted to draw attraction to where the main bulk of countries now sit (in the 60-70 range) as oppose to the 50-60 range as well as the minimum life expectancy now sitting at 50 years old.

The box plot, is a safe option and also has its own benefits.

  1. You can see two country outliers, something that I don’t necessarily account for in my equations, so you see a small bump in the violin chart. For example, the country in the America’s for the previous value was struck by a natural disaster in that year.

  2. It’s much easier to compare quartiles and the median. What you would notice from the box plot is the range in life expectancy is still quite large in Africa, even though there are many countries that sit in the same space as the other three quadrants.

  3. Box plots are something analysts are much more familiar with and the reflection of the violin may over exacerbate the true density at each age mark.

I feel like that was somewhat a fair reflection even though i’m biased.

Anyway, onto the creation.

TABLEAU PREP

Before reading – do revisit Liam Holland blog, as I won’t be re-explaining the specific calculations, just the amendments needed to add two layers.

First things first. Prep. I liked using it once I got the hang of it. A fairly simple, easy to interpret prepping tool. I like that I could stick multiple formula changes into one clean step. I also like that it comes with my Tableau license, so I can use it in future prep work. I can’t necessarily comment on it in the business setting as I’ve never used it before IronViz.

We want to find our maximum year and minimum year value for each country. This is because our data had a few holes in it, so we have to take the closest to my date range ends.

Next we assign a SampleID from Liam’s blog. (I.e a Row ID). To do so I just create a custom calculation of 1. Then add a partition using the following { PARTITION [One]: { ORDERBY [Country Name] ASC: RANK()}}, pretty much meaning give each row a new number.

Following that we revisit the blog and look to add the scaffolding element. To make our chart nice and curvy we have to add in a bunch of marks in between to help smooth out the lines.

In the same way that you would do this join in Tableau, you can do it in prep. I ended up joining 1=1 through two custom join calculations in the dataset. Repeating this both for our minimum section and maximum section.

I throw in a quick aggregate to check the my calculations have worked looking at the average life expectancy across the 4 quadrants.

The next thing to do is create three new calculations from the blog.

Sample Value – This is just our value of life expectancy relabelled.

Evenly distributed scaffold values –

So this is quite the chunky calc at the moment – So i replace as much of it as possible. The {MIN} and {MAX} values I ended up working out and hardcoding.

The scaffold factor is also replaceable. I actually built the chart multiple times over in Tableau before moving my calculations to prep so I knew a scaffold factor of 4 worked well.

Here is an example calculation stripped back.

 

IF [Scaffold Values] = 0 THEN 50.16 – 4
ELSEIF [Scaffold Values] = 99 THEN 83.94 + 4
ELSE
(50.16 – 4) +

(
ABS(
(83.94+4) – (50.16-4)
)
* ([Scaffold Values]/99)
)
END

Do note these hardcoded values will change dependent on looking at the minimum or maximum dataset.

Kernel –

Again, we can replace the Bandwith as a hardcoded value instead of a parameter that is used in Tableau once we know what looks good. For example here is an example calculation

(1/(155*1.52)*(1/(SQRT(2*PI()))) * EXP(-0.5 * (([Evenly distributed scaffold values] – [Sample Value])^2)/1.52))

Amazing. so far, all we have done is replace a few tableau parameters and moved our tableau calculations into prep.

So where is the value in this?

Well this bit. Currently, we have the calculations to create half a violin chart. But now we want to be able to create the reflection. You’ll see in Liams blog he reflects the axis using a dual axis. But we want this already pre done in prep. So what we can do is create a duplicate dataset and reverse the kernel using -1 of kernel.

We then union this data back on itself.

So we have all the points, but reflected on their y axis. Once we’ve done this we would be able to plot the violin and it would automatically connect our negative values to the end of 0!

The final part of this section would be to rename our Kernel. We want our kernel to be a different name for our previous and current values.

To finish the data prep all we need to do is bring back together our worksheets for the maximum and minimum values. Voila!

For how to finish building this visual you can now refer back to Liam’s blog. The only things to bear in mind are:

You will want to drag an extra column of ‘Table’ onto detail. This is so tableau knows if you are referring to the current (max) or previous (min) data sets.

You will need to dual axis the kernels. One you will make a polygon, one will be a Line. Feel free to download my workbook (link at top) to see how this was done within my own viz. Once again, the prep file can be found at the top of the page.

OTHER RESOURCES

It’s only fair to list out some resources for other where to learn more about distribution in a non bias way.

Caitlin Walsh – Data School

James Driver – Data School

Anna Foard – Box and Whisker Plot

Chartio – Violin Plots

GOING FURTHER

  • Try using layers to add extra details in of median, LQ and UQ.

  • Try applying it to your own data.

  • Try prepping the data in a tool outside of prep.

LOGGING OFF,

CJ