Clustering is a very common data mining task and has a wide variety of applications from customer segmentation to grouping of text documents. K-means clustering was one of the examples I used on my blog post introducing R integration back in Tableau 8.1. Many others in Tableau community wrote similar articles explaining how different clustering techniques can be used in Tableau via R integration.

One thing I didn’t see getting much attention was time series clustering and using hierarchical clustering algorithms. So I thought it might be good to cover both in single post.

Let’s say you have multiple time series curves (stock prices, social media activity, temperature readings…) and want to group similar curves together. Series don’t necessarily align perfectly, in fact they could be even about events that are happening at different pace.

There are of course many algorithms to achieve this task but R conveniently offers a package for Dynamic Time Warping. Below is what my calculated field in Tableau looks like.

I started by loading the dtw package, then converted my data from long table format (all time series are in the same column) to wide table format (each series is a separate column).

I then computed distance matrix using dtw as my method and applied hierarchical clustering with average linkage. This gives a tree which I then prune to get the desired number of clusters.

And finally, data is converted back into a long table before being pulled back into Tableau.

Screenshot is from Tableau 10, but don’t worry if you’re not on the Beta program. You can download Tableau 8.1 version of the workbook from HERE. Enjoy!

### Like this:

Like Loading...

*Related*

Hi Bora,

What’s the “size per series” in the last line ?

Best,

Dunk

Hi Bora,

What’s the “size per series” in the last line?

Best,

Dunk

It is a calculated field defined in the workbook that gets the length (number of data points) in each line. Table is in the long format, all the data is in the same column as opposed to being on separate columns. By looking at the size, it is possible to slice the data at every Nth row to break into multiple columns, the matrix format R needs for this type of analysis.

I’m having trouble running some script in Tableau to make predictions using a logistic model in R. Can somebody help me out?

Hi Chris, did you see the blog post on logistic regression?

Hi Bora,

This is very helpful. Appreciate it a lot. But could you give a little bit more instruction on how to create ‘curve’ and ‘clusters’? Thanks a lot.

Curve is the ID for each series. Assume you have sensors each identified by an ID so you know what time series came from what sensor. That is what Curve does. Clusters is the calculated field shown in the screenshot.