A common question about R integration feature in Tableau 8.1 is how to use it with forecasting packages available in R. This is an interesting question for two reasons:
SCRIPT_ functions are calculated fields and calculated fields don’t add rows to the table while that’s what forecasting is all about.
Forecast packages normally return just the forecast results which is fewer rows than the input data and at the same time, doesn’t make sense to join back in.
Here is a relevant thread from Tableau forums.
Before we move on any further, I would like to emphasize that Tableau has a built in forecast functionality which makes most of what we will cover here possible with the click of a button and comes with many more convenience features. The intent of the blog post is to show an alternative approach while very complicated can be useful for some.
The example in the forums uses the fpp package. For variety in this blog post I will be using the forecast package. It is worthwhile noting that, we worked very closely with Prof. Hyndman, the author of R’s forecast package when developing the forecasting feature in Tableau 8.
Let’s start with the visualizing the fit and residuals for your in-sample forecast. You will find this in the first sheet of the workbook I provided a link for at the end of the post.
We have two time series in this chart. To reduce the network traffic, I retrieved both columns in a single call to Rserve. Let’s take a look at the calculation.
First, we are loading the library. For this to work, please make sure the library and its dependencies are installed. You can find more about how to preload libraries for better performance in my posts here and here. Then we are creating a time series object (ts) using [Earnings] column (.arg1), since it is quarterly our deltat is 1/4 and our time series starts at the beginning of 1960. You will notice the paste function is taking two columns from the resulting forecast (fit and residuals) and concatenating them using ~ as the separator which allows us to retrieve both columns at once. Then in the calculated fields named Fit and Residuals, we tokenize the results. Only one of the calculated fields is shown below for brevity.
So far we’ve been dealing with the in-sample forecast, so we haven’t yet touched the core question about how to retrieve the future predictions into Tableau. Without further ado, let’s jump to the more exciting part. I will demonstrate two approaches. The first one is domain shifting which is shown in the second sheet.
Our calculation as shown above is very similar to the one we looked at earlier. But this time we are asking for a forecast (mean) of length (h=) set by the Tableau parameter named [PeriodsToForecast] and using append function to add the predicted rows to the end of the [Earnings] column (which answers the question 2 above) while dropping just as many rows from the top. The result is shown below. Since we forecast 10 quarters, our time series now extend 2.5 year beyond 1980 but our chart doesn’t start at 1960 anymore.
We achieve this also by shifting the date axis by the same amount as shown below.
But what if you don’t want to trim rows from the beginning of your data? That’s where densification comes to our aid.
The second approach relies on the fact that Tableau can fill in the missing values given a range. What we will do is to shift the last data point to the date we want our forecast to end. This will create a gap between the last value and the previous and Tableau’s “Show Missing Values” option will automatically add rows to fill that gap.
The last date is moved out to provide the full date extent for forecasting using the following calculated field. Here [MaxDate] is created at the data source level by joining the main table with a sub-query that returns the MAX(Quarter) in the input data.
Our calculated field looks very similar to the previous calculation. Important changes are highlighted which involve removing the blank rows densification added from the data passed on to R’s forecast function as input and pulling back the shifted last data point to the correct date for the forecasting function. Then the results get merged back in using the same append function but this time without anything getting trimmed.
The result is shown below. For comparison we are forecasting 10 quarters (same as above). Now our time series extend 2.5 year beyond 1980 while chart still starts at 1960.
You can download the example workbook from HERE.