Correlation matrices offer a good way of visualizing similarities between members in your dataset. Data points used in calculating pair-wise correlations could be of many different kinds. For example you could be creating a correlation matrix between different commodities and have the price for each commodity over a period of time used for calculating the correlations or you could be comparing products based on their varying qualities. In this example I will use the R sample dataset; mtcars to visualize correlations between different makes and models of cars based on their MPG, horsepower, displacement, # cylinders, weight etc. Before we move on here is what the results look like. You can click on the image below to open a live version of it posted on Tableau Public implemented using table calculations.

Easiest way to get this layout with your data is to create a cross join which will result in paired combinations of values of different variables for different cars. But Tableau Zen Master, Jonathan Drummey has a brilliant alternative which relies on domain completion. It makes the calculations slightly more complicated but by not doing a cross-join, you will cut the size of the data significantly which while not noticeable in this particular example, for large datasets will make a difference. He was kind enough to share his solution which you can find in the third sheet of the workbook provided through the link below.

Since we want to treat values for different variables as part of a series based on which we will compare two cars, it helps to use a long table i.e. put all values into one column and have another column to identify what that value is (e.g. MPG vs. horsepower) as opposed to a wide table where each variable is a column of its own. Table calculation’s addressing and partitioning settings are very important to get the right results.

Given it is not hard to do this using Tableau’s own table calculations, you may think it is a bit of an overkill to do this in R. However what I would like to demonstrate here is something that can be applied more widely. Correlation matrix happens to be the example that came in handy. If you look at the sheet that was built using R, you will see a pattern that can be useful in calculations where you need to create a matrix from a tabular data source for your analysis which I also used in my blog post here for the multi-dimensional scaling example. The pattern consists of breaking the long table into parts that you want to have as separate columns (in this case using R’s split function) and then merging them into a matrix (in this case using cbind and do.call). But for correlation matrix, it gets a bit more interesting due to the cross join.

By looking at the colors in the matrix, you can see that Maserati Bora is most similar to Ferrari Dino while Lotus Europa and Cadillac Fleetwood are very different from each other.

You can download the workbook from HERE which contains correlation matrix built both using R integration and purely in Tableau using table calculations.

If you download the workbook, you can use the correlation coefficient as a quick filter to limit what you see to highly or poorly correlated cars which dynamically resizes your matrix. Or you can use the Variable dimension on a filter to change the number of properties used in calculating the correlation. Would our matrix look different if we were just using number of carburetors, weight and mpg instead of all 11 metrics?

Enjoy!

Pingback: Comparing Each Against Each Other: The No-SQL Cross Product | Drawing with Numbers

Pingback: “Show me the way to the next whiskey bar” (The Doors – Alabama Song) – Interactive Location Recommendation using Tableau | Patient 2 Earn

Is it possible to do something like this with the correlation of the variable? In other words, can correlation matrix created in R by the command cor(mtcars) be replicated in Tableau?

Matt,

With Tableau 9.0’s ability to load .RData files, building the equivalent of a cor() function in Tableau is amazingly simple now. I just posted a solution on Stack Overflow here:

http://stackoverflow.com/questions/29638660/replicate-r-cor-function-in-tableau/29639061#29639061

Cheers!

I should like to apply your method of calculating correlations and visualizing scatter plots to a data table with the following form

Country Name Series Name Year Value

Afghanistan Birth rate, crude (per 1,000 people) 2011 [YR2011] 36.556

Afghanistan Birth rate, crude (per 1,000 people) 2012 [YR2012] 35.254

Afghanistan Death rate, crude (per 1,000 people) 2011 [YR2011] 8.375

Afghanistan Death rate, crude (per 1,000 people) 2012 [YR2012] 8.095

Albania Birth rate, crude (per 1,000 people) 2011 [YR2011] 12.651

Albania Birth rate, crude (per 1,000 people) 2012 [YR2012] 12.756

Albania Death rate, crude (per 1,000 people) 2011 [YR2011] 6.71

Albania Death rate, crude (per 1,000 people) 2012 [YR2012] 6.812

Algeria Birth rate, crude (per 1,000 people) 2011 [YR2011] 24.658

Algeria Birth rate, crude (per 1,000 people) 2012 [YR2012] 24.579

Algeria Death rate, crude (per 1,000 people) 2011 [YR2011] 5.855

Algeria Death rate, crude (per 1,000 people) 2012 [YR2012] 5.895

This table originates in World Development Indicators after downloading to Excel and datashaping.

I have tried to copy your code and changing it according to my lights but RServe refuses to process it. It seems that my lights are to dim. Can you throw some light on my problem?

Do you mind posting the data (if possible even you in progress workbook as twbx) on Tableau forums and mention my name in the forum post so I can locate it, get a copy and take a look?

I shall do that.

Great post! Can you pass along the steps you used to create the matrix using the table calculation? I need to do something similar but am having a hard time re-creating what you have done.

Thanks!

Jonathan Drummey has a nice blog post describing this technique in detail. You can find it here http://drawingwithnumbers.artisart.org/comparing-each-against-each-other-the-no-sql-cross-product/

Sir,

Is it possible to replicate the SLOPE formula used in excel, in Tableau? Also I have my data in vertical column format but the slope would be calculated such that I would take values of column A from the first value till the 5th last value and for coulmn B it would start from the 5th cell (down) uptill the last cell.

Thank you for the post, it is extremely useful. Unfortunately, I have very hard times replicating your analysis. The problem is that sometimes it works, sometimes it doesn’t and I can’t figure out why that is a case. I have posted a question on Tableau Forum a month ago, with two files where the replication didn’t work. Would you please have a look or point me to a step by step instruction? Thanks a million!

Can you point me to the Tableau forum post? I see one thread from back in November that seems relevant but you said it was a month ago. I would suggest checking the addressing settings for ValueAcross and ValueDown as a starting point.

Thank you, Bora. I think we figured out what I was missing. The discussion is here http://community.tableau.com/message/330688#330688 if you want to have a look what was my struggle!

Hi, I would like to make correlation matrix for measures in the data set like with corrlplot function (corrplot library). How could I do it? May you help with this? It is one of the very first things I do when I deal with new data set in R and I would like to keep on doing it with TB 8.2 (and soon 9)

What’s shown here is pretty much the same as what you would get with

corrplot(cor(mtcars), method = “color”)

I am assuming you were expecting a different output? What did you want it to look like?

I would like to get the same result. But when I study the workbook CorrelationMatrixInTableau I have found out that in all data sources the variables are transformed into one column. Unlikely to mtcars in R – variables are in different column. So I would like to get the same result in TB but with variables in separate columns in data source. I hope I have explained it better. Adam

Any reason why you don’t want them in the same column? If the issue that you don’t want to manually transform, in Tableau 9.0 there is unpivot option. You just need to select the columns, right click, unpivot and you get a long table you can use for the correlation matrix. Long table allows you to calculate the correlation matrix, using table calculations as addressing for table calculations do not cut across measures.

The reason I do not want to have them in the same column is that I manipulate the data for regular reporting, charts and so on (also clustering). I want to build standard, transferable solution for any statistical research. So correlation matrix (and histograms) is just auxiliary method to see “what we got here”. I will check this pivot – thank you!

I see. I hope Pivot will help solve the problem. I would create a copy of the data source, pivot the copy and use it for correlation matrix and the not-pivoted version for the rest of the analysis.

Unfortunately the data get into as non-numeric. Then I do some counts, distinct counts, new calculations. Then those new measures are used for clustering. So during import I can not use Pivot/Unpivot option. It is not even as an option … But it works with simple table, that is right.

It would be great if I could use [Measure Values] as argument 1 and 2, something like this: SCRIPT_REAL(“cor(.arg1,.arg2)”,ATTR([Measure Values]),ATTR([Measure Values]))

Hi Bora,

I have huge data sets (for e.g., 25 different components & each has more than 30 years of historical monthly data) Is it possible to build a correlation table instead of the above color matrix. And also need for various standard time period (3 years, 5 years etc.,) or customized period (Starting Jan’ xx to as of end month).

Note: I am not used “R” very much.

Any leads would certainly appreciate.

Regards,

Ram

If you put the value on label instead of color, you will get something like a table instead of a heat map. If I understand the question correctly, I had built something like this on Tableau forums earlier. http://community.tableau.com/thread/149908

Of course. It is simpler to get a plain table than something formatted like this. Did you see my covariance matrix example. That does it in a table format. https://boraberan.wordpress.com/2014/07/12/creating-a-covariance-matrix-in-tableau-using-table-calculations/

Hi Adam, I am doing a project in stocks. Of which I want to show the correlation table instead of heatmap (not like above) only the number for various time periods for various ending time period.

Say.. I have historical data of 100 stocks from their inception (month-end value data), roughly goes until Jan 1930, ending as of Feb 2016.

Question.

1. I want to pick certain stocks in quick filter, only those stocks should be shown in both row and column.

2. I also need the ending time “As of …” filter, based on the correlation table changes. This is for to see the trend

3. I also need the standard time period like 3-years, 5-years, 10-years as of Ending data (from previous filter). Subject to change based on ending date

4. How to calculate the correlation if we have the historical monthly data for the stocks (not familiar with any other tools)

Is there anything that you could help or show me some guidance. I already build a sample one in excel. But trying to replicate in Tableau. Finding very challenge.

Regards,

Kristy

My covariance matrix example is very similar to what you’re trying to do

https://boraberan.wordpress.com/2014/07/12/creating-a-covariance-matrix-in-tableau-using-table-calculations/

It computes covariance between stock and input data is historical stock data. The only difference is that it uses a different formula and computes covariance but everything else in the setup of the workbook is the same.

I saw the covariance workbook and i can get the index/stocks as per my requirements. However the correlation calculation is not giving me the right numbers. Can you show some insight on the correlation formula that you builded. Also if possible, is there a way to filter the time period calculation for correlation. I have the huge history, but i want to filter like last 3 years, 5 years something like that.

Hi Bora, I have a table which has company (i.e., stock price) data goes beyond 20 years. I uploaded the data in Tableau and given the date filters (as i need, like 1 year, 2 years etc). I get the respective data in Table. I followed you procedure to create the correlation in Tableau. (I am not familiar with R). I am not getting the result as i need (previously done in excel with correlation formula).

Table format: Date / Stock / Price (now i created “Stock copy” as in the workbook). So i have four column now… (like [Date / Stock / Stock(Copy) / Price]). When i apply the formula in your workbook, i am not getting the result as i required.

Please help

Pingback: Using your R models for in-database scoring « Bora Beran

Greɑt article, totally wht ӏ wwas looқing for.