R, Visualization

Creating a correlation matrix in Tableau using R or Table Calculations

Correlation matrices offer a good way of visualizing similarities between members in your dataset. Data points used in calculating pair-wise correlations could be of many different kinds. For example you could be creating a correlation matrix between different commodities and have the price for each commodity over a period of time used for calculating the correlations or you could be comparing products based on their varying qualities. In this example I will use the R sample dataset; mtcars to visualize correlations between different makes and models of cars based on their MPG, horsepower, displacement, # cylinders, weight etc. Before we move on here is what the results look like. You can click on the image below to open a live version of it posted on Tableau Public implemented using table calculations.

Correlation matrix in Tableau

Easiest way to get this layout with your data is to create a cross join which will result in paired combinations of values of different variables for different cars. But Tableau Zen Master, Jonathan Drummey has  a brilliant alternative which relies on domain completion. It makes the calculations slightly more complicated but by not doing a cross-join, you will cut the size of the data significantly which while not noticeable in this particular example, for large datasets will make a  difference. He was kind enough to share his solution which you can find in the third sheet of the workbook provided through the link below.

Since we want to treat values for different variables as part of a series based on which we will compare two cars, it helps to use a long table i.e. put all values into one column and have another column to identify what that value is (e.g. MPG vs. horsepower) as opposed to a wide table where each variable is a column of its own. Table calculation’s addressing and partitioning settings are very important to get the right results.

Given it is not hard to do this using Tableau’s own table calculations, you may think it is a bit of an overkill to do this in R. However what I would like to demonstrate here is something that can be applied more widely. Correlation matrix happens to be the example that came in handy. If you look at the sheet that was built using R, you will see a pattern that can be useful in calculations where you need to create a matrix from a tabular data source for your analysis which I also used in my blog post here for the multi-dimensional scaling example.  The pattern consists of breaking the long table into parts that you want to have as separate columns (in this case using R’s split function) and then merging them into a matrix (in this case using cbind and do.call). But for correlation matrix, it gets a bit more interesting due to the cross join.

By looking at the colors in the matrix, you can see that Maserati Bora is most similar to Ferrari Dino while Lotus Europa and Cadillac Fleetwood are very different from each other.

You can download the workbook from HERE which contains correlation matrix built both using R integration and purely in Tableau using table calculations.

If you download the workbook, you can use the correlation coefficient as a quick filter to limit what you see to highly or poorly correlated cars which dynamically resizes your matrix. Or you can use the Variable dimension on a filter to change the number of properties used in calculating the correlation. Would our matrix look different if we were just using number of carburetors, weight and mpg instead of all 11 metrics?

Enjoy!

Standard

34 thoughts on “Creating a correlation matrix in Tableau using R or Table Calculations

  1. Pingback: Comparing Each Against Each Other: The No-SQL Cross Product | Drawing with Numbers

  2. Pingback: “Show me the way to the next whiskey bar” (The Doors – Alabama Song) – Interactive Location Recommendation using Tableau | Patient 2 Earn

  3. Matt says:

    Is it possible to do something like this with the correlation of the variable? In other words, can correlation matrix created in R by the command cor(mtcars) be replicated in Tableau?

  4. I should like to apply your method of calculating correlations and visualizing scatter plots to a data table with the following form

    Country Name Series Name Year Value
    Afghanistan Birth rate, crude (per 1,000 people) 2011 [YR2011] 36.556
    Afghanistan Birth rate, crude (per 1,000 people) 2012 [YR2012] 35.254
    Afghanistan Death rate, crude (per 1,000 people) 2011 [YR2011] 8.375
    Afghanistan Death rate, crude (per 1,000 people) 2012 [YR2012] 8.095
    Albania Birth rate, crude (per 1,000 people) 2011 [YR2011] 12.651
    Albania Birth rate, crude (per 1,000 people) 2012 [YR2012] 12.756
    Albania Death rate, crude (per 1,000 people) 2011 [YR2011] 6.71
    Albania Death rate, crude (per 1,000 people) 2012 [YR2012] 6.812
    Algeria Birth rate, crude (per 1,000 people) 2011 [YR2011] 24.658
    Algeria Birth rate, crude (per 1,000 people) 2012 [YR2012] 24.579
    Algeria Death rate, crude (per 1,000 people) 2011 [YR2011] 5.855
    Algeria Death rate, crude (per 1,000 people) 2012 [YR2012] 5.895

    This table originates in World Development Indicators after downloading to Excel and datashaping.

    I have tried to copy your code and changing it according to my lights but RServe refuses to process it. It seems that my lights are to dim. Can you throw some light on my problem?

  5. Great post! Can you pass along the steps you used to create the matrix using the table calculation? I need to do something similar but am having a hard time re-creating what you have done.
    Thanks!

  6. Tableau enthusiast says:

    Sir,

    Is it possible to replicate the SLOPE formula used in excel, in Tableau? Also I have my data in vertical column format but the slope would be calculated such that I would take values of column A from the first value till the 5th last value and for coulmn B it would start from the 5th cell (down) uptill the last cell.

  7. Artyom says:

    Thank you for the post, it is extremely useful. Unfortunately, I have very hard times replicating your analysis. The problem is that sometimes it works, sometimes it doesn’t and I can’t figure out why that is a case. I have posted a question on Tableau Forum a month ago, with two files where the replication didn’t work. Would you please have a look or point me to a step by step instruction? Thanks a million!

  8. Adam says:

    Hi, I would like to make correlation matrix for measures in the data set like with corrlplot function (corrplot library). How could I do it? May you help with this? It is one of the very first things I do when I deal with new data set in R and I would like to keep on doing it with TB 8.2 (and soon 9)

    • What’s shown here is pretty much the same as what you would get with

      corrplot(cor(mtcars), method = “color”)

      I am assuming you were expecting a different output? What did you want it to look like?

      • Adam says:

        I would like to get the same result. But when I study the workbook CorrelationMatrixInTableau I have found out that in all data sources the variables are transformed into one column. Unlikely to mtcars in R – variables are in different column. So I would like to get the same result in TB but with variables in separate columns in data source. I hope I have explained it better. Adam

      • Any reason why you don’t want them in the same column? If the issue that you don’t want to manually transform, in Tableau 9.0 there is unpivot option. You just need to select the columns, right click, unpivot and you get a long table you can use for the correlation matrix. Long table allows you to calculate the correlation matrix, using table calculations as addressing for table calculations do not cut across measures.

  9. Adam Karolewski says:

    The reason I do not want to have them in the same column is that I manipulate the data for regular reporting, charts and so on (also clustering). I want to build standard, transferable solution for any statistical research. So correlation matrix (and histograms) is just auxiliary method to see “what we got here”. I will check this pivot – thank you!

    • I see. I hope Pivot will help solve the problem. I would create a copy of the data source, pivot the copy and use it for correlation matrix and the not-pivoted version for the rest of the analysis.

  10. Adam Karolewski says:

    Unfortunately the data get into as non-numeric. Then I do some counts, distinct counts, new calculations. Then those new measures are used for clustering. So during import I can not use Pivot/Unpivot option. It is not even as an option … But it works with simple table, that is right.

  11. Adam Karolewski says:

    It would be great if I could use [Measure Values] as argument 1 and 2, something like this: SCRIPT_REAL(“cor(.arg1,.arg2)”,ATTR([Measure Values]),ATTR([Measure Values]))

  12. Ramakrishnan says:

    Hi Bora,

    I have huge data sets (for e.g., 25 different components & each has more than 30 years of historical monthly data) Is it possible to build a correlation table instead of the above color matrix. And also need for various standard time period (3 years, 5 years etc.,) or customized period (Starting Jan’ xx to as of end month).

    Note: I am not used “R” very much.

    Any leads would certainly appreciate.

    Regards,
    Ram

  13. Kristy says:

    Hi Adam, I am doing a project in stocks. Of which I want to show the correlation table instead of heatmap (not like above) only the number for various time periods for various ending time period.

    Say.. I have historical data of 100 stocks from their inception (month-end value data), roughly goes until Jan 1930, ending as of Feb 2016.

    Question.

    1. I want to pick certain stocks in quick filter, only those stocks should be shown in both row and column.
    2. I also need the ending time “As of …” filter, based on the correlation table changes. This is for to see the trend
    3. I also need the standard time period like 3-years, 5-years, 10-years as of Ending data (from previous filter). Subject to change based on ending date
    4. How to calculate the correlation if we have the historical monthly data for the stocks (not familiar with any other tools)

    Is there anything that you could help or show me some guidance. I already build a sample one in excel. But trying to replicate in Tableau. Finding very challenge.

    Regards,
    Kristy

      • Kristy says:

        I saw the covariance workbook and i can get the index/stocks as per my requirements. However the correlation calculation is not giving me the right numbers. Can you show some insight on the correlation formula that you builded. Also if possible, is there a way to filter the time period calculation for correlation. I have the huge history, but i want to filter like last 3 years, 5 years something like that.

  14. Ramky says:

    Hi Bora, I have a table which has company (i.e., stock price) data goes beyond 20 years. I uploaded the data in Tableau and given the date filters (as i need, like 1 year, 2 years etc). I get the respective data in Table. I followed you procedure to create the correlation in Tableau. (I am not familiar with R). I am not getting the result as i need (previously done in excel with correlation formula).

    Table format: Date / Stock / Price (now i created “Stock copy” as in the workbook). So i have four column now… (like [Date / Stock / Stock(Copy) / Price]). When i apply the formula in your workbook, i am not getting the result as i required.

    Please help

  15. Pingback: Using your R models for in-database scoring « Bora Beran

  16. Pingback: Basic Statistics in Tableau : Correlation | Smoak Signals | Business, Data, Etc.

  17. Patty C says:

    Hi! I would love to download the workbook to see how the Tableau table calculations were calculated. Unfortunately, the link doesn’t work anymore. Any chance you can update the link. I would be very grateful as I can’t seem to find another source to show how to make a correlation matrix with more than two values in Tableau.
    Thanks!

Leave a reply to Bora Beran Cancel reply