Time Series Clustering in Tableau using R

Clustering is a very common data mining task and has a wide variety of applications from customer segmentation to grouping of text documents. K-means clustering was one of the examples I used on my blog post introducing R integration back in Tableau 8.1. Many others in Tableau community wrote similar articles explaining how different clustering techniques can be used in Tableau via R integration.

One thing I didn’t see getting much attention was time series clustering and using hierarchical clustering algorithms. So I thought it might be good to cover both in single post.

Let’s say you have multiple time series curves (stock prices, social media activity, temperature readings…) and want to group similar curves together. Series don’t necessarily align perfectly, in fact they could be even about events that are happening at different pace.

Time series clustering in Tableau using R

There are of course many algorithms to achieve this task but R conveniently offers a package for Dynamic Time Warping. Below is what my calculated field in Tableau looks like.

Calculation for dynamic time warping in Tableau

I started by loading the dtw package, then converted my data from long table format (all time series are in the same column) to wide table format (each series is a separate column).

I then computed distance matrix using dtw as my method and applied hierarchical clustering with average linkage. This gives a tree which I then prune to get the desired number of clusters.

And finally, data is converted back into a long table before being pulled back into Tableau.

Screenshot is from Tableau 10, but don’t worry if you’re not on the Beta program. You can download Tableau 8.1 version of the workbook from HERE. Enjoy!

7 comments on “Time Series Clustering in Tableau using R

  1. Hi Bora,

    What’s the “size per series” in the last line ?


  2. Hi Bora,

    What’s the “size per series” in the last line?


    • Bora Beran says:

      It is a calculated field defined in the workbook that gets the length (number of data points) in each line. Table is in the long format, all the data is in the same column as opposed to being on separate columns. By looking at the size, it is possible to slice the data at every Nth row to break into multiple columns, the matrix format R needs for this type of analysis.

  3. Chris Leisner says:

    I’m having trouble running some script in Tableau to make predictions using a logistic model in R. Can somebody help me out?

  4. Yun Lin says:

    Hi Bora,
    This is very helpful. Appreciate it a lot. But could you give a little bit more instruction on how to create ‘curve’ and ‘clusters’? Thanks a lot.

    • Bora Beran says:

      Curve is the ID for each series. Assume you have sensors each identified by an ID so you know what time series came from what sensor. That is what Curve does. Clusters is the calculated field shown in the screenshot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s