Quick Tip : Overlaying curves on Tableau scatter plots with R

Tableau provides a good set of trend line, reference line and band options but sometimes you want to overlay curves based on a custom equation. Logistic regression curves, sine curves, quantile regression curves…. And want these overlay curves to be smooth…

This is very easy to do by taking advantage of the technique I shared when building Coxcomb charts and radial treemaps. If you know the equation (or know how to get to it) and it can be described in Tableau’s calculation language you can do so using a table calculation. But doing the fit dynamically involves R and when you’re passing the data to R you need to do some basic NULL handling. Here are two examples showing what the results might look like. You can see that despite very few points in my dataset and large gaps between them, the curves look very smooth.

A sine curve and logistic regression curve overlay in Tableau

The key component is the bin field created on the variable that’s on the X axis. In Tableau bins can be used to trigger densification by turning on “Show Missing Values” option on the bin. Doing so adds NULL rows into the data backing up the visualization, which you can fill with values from table calculations (including SCRIPT_ functions). In your R script, you need to remove these artificially generated NULLs, in order not to confuse the curve fitting procedure you’re applying. 

I tied the bin size to a parameter so you can try different values to make the curves coarser or smoother.

If you want to take this technique it a bit further you could use one bin for each axis which will allow you to create a grid. Then you an treat each cell like a pixel in a raster and draw shaded areas such as contours.

Below you can see two examples of this. First one estimates the bivariate kernel density of the data points in the viz on-the-fly using R on the fly and draws the contours using the secondary axes of this dual (technically quadruple) axis chart.

Displaying kernel density as overlay in Tableau

The second chart uses the same data to fit a one-class SVM (support vector machine) which is often used for outlier/novelty detection with a Gaussian radial basis function then draws the decision boundaries using the secondary axes (blue ellipse). Suspected outliers are shown in red while inliers are shown in white.

Displaying the boundaries of a one-class SVM as overlay in Tableau

You can download the sample Tableau workbook from HERE

2 comments on “Quick Tip : Overlaying curves on Tableau scatter plots with R

  1. sara says:

    Hi Bora, I’d love to see how you got the ellipse on your scatterplot but Tableau says the workbook you’ve linked to is not currently available on Tableau Public.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s