Correlation matrices offer a good way of visualizing similarities between members in your dataset. Data points used in calculating pair-wise correlations could be of many different kinds. For example you could be creating a correlation matrix between different commodities and have the price for each commodity over a period of time used for calculating the correlations or you could be comparing products based on their varying qualities. In this example I will use the R sample dataset; mtcars to visualize correlations between different makes and models of cars based on their MPG, horsepower, displacement, # cylinders, weight etc. Before we move on here is what the results look like. You can click on the image below to open a live version of it posted on Tableau Public implemented using table calculations.
Easiest way to get this layout with your data is to create a cross join which will result in paired combinations of values of different variables for different cars. But Tableau Zen Master, Jonathan Drummey has a brilliant alternative which relies on domain completion. It makes the calculations slightly more complicated but by not doing a cross-join, you will cut the size of the data significantly which while not noticeable in this particular example, for large datasets will make a difference. He was kind enough to share his solution which you can find in the third sheet of the workbook provided through the link below.
Since we want to treat values for different variables as part of a series based on which we will compare two cars, it helps to use a long table i.e. put all values into one column and have another column to identify what that value is (e.g. MPG vs. horsepower) as opposed to a wide table where each variable is a column of its own. Table calculation’s addressing and partitioning settings are very important to get the right results.
Given it is not hard to do this using Tableau’s own table calculations, you may think it is a bit of an overkill to do this in R. However what I would like to demonstrate here is something that can be applied more widely. Correlation matrix happens to be the example that came in handy. If you look at the sheet that was built using R, you will see a pattern that can be useful in calculations where you need to create a matrix from a tabular data source for your analysis which I also used in my blog post here for the multi-dimensional scaling example. The pattern consists of breaking the long table into parts that you want to have as separate columns (in this case using R’s split function) and then merging them into a matrix (in this case using cbind and do.call). But for correlation matrix, it gets a bit more interesting due to the cross join.
By looking at the colors in the matrix, you can see that Maserati Bora is most similar to Ferrari Dino while Lotus Europa and Cadillac Fleetwood are very different from each other.
You can download the workbook from HERE which contains correlation matrix built both using R integration and purely in Tableau using table calculations.
If you download the workbook, you can use the correlation coefficient as a quick filter to limit what you see to highly or poorly correlated cars which dynamically resizes your matrix. Or you can use the Variable dimension on a filter to change the number of properties used in calculating the correlation. Would our matrix look different if we were just using number of carburetors, weight and mpg instead of all 11 metrics?