R, Visualization

Using R forecasting packages from Tableau

A common question about R integration feature in Tableau 8.1 is how to use it with forecasting packages available in R. This is an interesting question for two reasons:

  1. SCRIPT_ functions are calculated fields and calculated fields don’t add rows to the table while that’s what forecasting is all about.
  2. Forecast packages normally return just the forecast results which is fewer rows than the  input data and at the same time, doesn’t make sense to join back in.

Here is a relevant thread from Tableau forums.

http://community.tableausoftware.com/thread/137551

Before we move on any further, I would like to emphasize that Tableau has a built in forecast functionality which makes most of what we will cover here possible with the click of a button and comes with many more convenience features. The intent of the blog post is to show an alternative approach while very complicated can be useful for some.

The example in the forums uses the fpp package. For variety in this blog post I will be using the forecast package. It is worthwhile noting that, we worked very closely with Prof. Hyndman, the author of R’s forecast package when developing the forecasting feature in Tableau 8.

Let’s start with the visualizing the fit and residuals for your in-sample forecast. You will find this in the first sheet of the workbook I provided a link for at the end of the post.

Model fit and residuals calculated in R, visualized in Tableau

We have two time series in this chart. To reduce the network traffic, I retrieved both columns in a single call to Rserve. Let’s take a look at the calculation.

image

First, we are loading the library. For this to work, please make sure  the library and its dependencies are installed. You can find more about how to preload libraries for better performance in my posts here and here. Then we are creating a time series object (ts) using [Earnings] column (.arg1), since it is quarterly our deltat is 1/4 and our time series starts at the beginning of 1960. You will notice the paste function is taking two columns from the resulting forecast (fit and residuals) and concatenating them using ~ as the separator which allows us to retrieve both columns at once. Then in the calculated fields named Fit and Residuals, we tokenize the results. Only one of the calculated fields is shown below for brevity.

Tokenizing concatenated results

So far we’ve been dealing with the in-sample forecast, so we haven’t yet touched the core question about how to retrieve the future predictions into Tableau. Without further ado, let’s jump to the more exciting part. I will demonstrate two approaches. The first one is domain shifting which is shown in the second sheet.

Retrieving the predicted data points from R

Our calculation as shown above is very similar to the one we looked at earlier. But this time we are asking for a forecast (mean) of length (h=) set by the Tableau parameter named [PeriodsToForecast] and using append function to add the predicted rows to the end of the [Earnings] column (which answers the question 2 above) while dropping just as many rows from the top.  The result is shown below. Since we forecast 10 quarters, our time series now extend 2.5 year beyond 1980 but our chart doesn’t start at 1960 anymore. 

Forecast using domain shifting, note the starting year of 1963

We achieve this also by shifting the date axis by the same amount as shown below.

Shifting the date axis

But what if you don’t want to trim rows from the beginning of your data? That’s where densification comes to our aid.

The second approach relies on the fact that Tableau can fill in the missing values given a range. What we will do is to shift the last data point to the date we want our forecast to end. This will create a gap between the last value and the previous and Tableau’s “Show Missing Values” option will automatically add rows to fill that gap.

Show Missing Values option

The last date is moved out to provide the full date extent for forecasting using the following calculated field. Here [MaxDate] is created at the data source level by joining the main table with a sub-query that returns the MAX(Quarter) in the input data.

Shifting the last date in the time series

Our calculated field looks very similar to the previous calculation. Important changes are highlighted which involve removing the blank rows densification added from the data passed on to R’s forecast function as input and pulling back the shifted last data point to the correct date for the forecasting function. Then the results get merged back in using the same append function but this time without anything getting trimmed.

Running the R forecast code with densification

The result is shown below. For comparison we are forecasting 10 quarters (same as above). Now our time series extend 2.5 year beyond 1980 while chart still starts at 1960. 

Forecast using domain completion, note the starting year of 1960

You can download the example workbook from HERE.

Advertisements
Standard

52 thoughts on “Using R forecasting packages from Tableau

  1. Ben Fox says:

    This has been incredibly helpful in trying to get a valuable forecast. I’m not a star with R, what would be your suggestion for getting this to work with hourly data, as opposed to quarterly. I couldn’t change the deltat to be a very small decimal.

  2. T Rebman says:

    Really helpful. I am struggling to find a way to pull out the upper and lower values for each of the forecasted data points. I have tried simply replacing the $mean with $upper but the function expected fewer values than the script returned.

  3. rashmi says:

    Sir, i have one month data …i have to forecast for next month..what should be deltat value?? is it work ….

    • rashmi says:

      SCRIPT_REAL(“library(forecast);jjearnts <- ts(.arg1,deltat=1440,start=c(2006, 1));fcast <- forecast(jjearnts, h=.arg2[1]);
      n<-length(.arg1); append(.arg1[(.arg2[1]+1):n],fcast$mean, after = n-.arg2[1])",SUM([earning]),[Parameter]) ………….(my starting data is from 01-04-2006 end date is 31-01-2015…..corresponding to this date i have daily earning data…………but in forcastion i m getting FLAT DATA 😦 ……….pls suggest me

    • If you only have one month of input data, you shouldn’t try to forecast the next month based on that.You can find any trend or seasonality with a single data point. Do you mean monthly data?

      • rashmi says:

        Thanks a lot…awesome prediction..u r greatt…..v happy 🙂 my next step is to add dependent parameter…such as what will happen if company partner will increase ?what will be my prediction of earning …/actually i m not basically focus on earning forecast…i wanna see how to add dependency parameter?? i have searched but only find the dashboards..that looks fantastic..suggest me for the same….n thanks again Sir,,i did forecast without any mathmatical calculation…tableau is great with R

  4. Warun says:

    I had a query. I am getting the following error when i pull for the ‘Forecast – Domain Completion’ into the Rows section: Error in na.fail.default(as.ts(x)) : missing values in object

  5. Bart says:

    Thank you for this value adding article! My model seems quite reliable but when I want to forecast, my historical information also transforms. For a particular week I have a cancel ratio of let’s say 60%. When I want to forecast for let’s say 12 weeks, my historical data is transformed to lower or higher percentages, compared to the data when I am not forecasting.What am I doing wrong?

    Thank you in advance.

    Best
    Bart

    • shimonil says:

      hi bora!
      realy good article man.
      i download the example and try to use the script on my data.
      i use this script:
      SCRIPT_REAL(“library(forecast);l<-length(.arg1);u<-.arg1[1:(l-.arg2[1])];
      n<-length(u);u[n]=.arg1[l];jjearnts <- ts(u, delta = 1/12,start=c(2014,1));
      fcast <- forecast(jjearnts, h=.arg2[1]);
      append(u,fcast$mean, after = n)",SUM([costAfterDiscount]),[Period To Forecast])
      but this script only change the values of my last periods (by "period to forecast") that i already have data for them.
      i want to forecast periods that i dont have values for them but i think that the append doesnt work good for me.
      thank you!

  6. naveena says:

    Hi Bora,

    this is an awesome article. i am fairly new to R and connecting to R from tableau and found this article very helpful. i had a quick question regarding the forecast values, do you know if there is a way to get the predictions/forest for the development data as well.. in the above scenario, i am looking for the orange line (predicted data) for Q’s in 1960 through 1980 as well to check how the model performed??

    thank you for your time and expertise…

  7. Colby Moss says:

    Hi Bora,
    I keep getting Unexpected number of results returned by SCRIPT function. Function expected 5206 values; 2268 values were returned.

    I saw on tableau’s site the troubleshoot is: The R script result must be either a scalar or vector of length one that is replicated for all rows, or a vector of length equal to the number of rows in the Tableau result table.

    But I can’t make much sense of it. Can you help?

    Here is my code:

    SCRIPT_REAL(”

    fit <- lm(.arg1 ~ .arg2 + .arg3)

    fit$fitted
    "
    ,
    SUM([Applications (2015)]),
    SUM([Applications (2014)]),
    SUM([Applications (2013)])

    )

    • ina says:

      I’m also gettting the same error whenever the arrgument i passed to tableau is a calculated field of first difference. how can we fix this? Would really apppreciate any insights from you, Bora..

  8. Hi Bora,

    I have started learning forecasting in R and Tableau. I was reading your post and I could not understand what actually fcast$mean is doing here?
    Any help is really appreciated.

    Thank you.

  9. Gabriel says:

    Excellent Article – trying to utilize method two to improve upon Tableau’s automatic forecasting capabilities. Could you explain in more detail how you created the parameter MaxDate? I’m not sure how to create a parameter at the Data source level from within Tableau desktop.

    Thanks again!

  10. Pingback: How To Forecast With Tableau And R | Data Science Riot!

  11. Uday says:

    Hi, I have 30 days data from 04/18/2016 to 05/19/2016, I want to predict for next 2 or 3 days, when I am using the following script I am getting the error as follows.

    Error – Error in stl(x, s.window = s.window, t.window = t.window, robust = robust) : series is not periodic or has less than two periods.

    script – SCRIPT_REAL(“library(forecast);jjearnts <- ts(.arg1, frequency = 365,start=c(2016,107));fcast <- forecast(jjearnts, h=.arg2[1]);
    n<-length(.arg1); append(.arg1[(.arg2[1]+1):n],fcast$mean, after = n-.arg2[1])",MAX([x]),[PeriodsToForecast]).

  12. Sunny says:

    Hello Bora,

    The following code works in R but it doesn’t in Tableau.

    Error: An error occurred while communicating with the RServe service.
    Error in seq.default(l + 1, l + n) : ‘to’ must be of length 1

    l <- as.numeric(length(.arg1));
    n <- .arg2;
    LREVENUE <- log(.arg1);
    timeperiod <- seq(l);
    timeperiod <- as.numeric(timeperiod);
    fit1 = lm(LREVENUE~timeperiod);
    Intercept <- fit1$coefficients[[1]];
    Slope <- fit1$coefficients[[2]];
    timeprd = seq(l+n);
    timeprd <- as.numeric(timeprd);
    trend <- Intercept + Slope*timeprd;
    ts = data.frame(timeprd, trend);
    trnd <- fit1$fitted.values;
    remaining <- LREVENUE/trnd;
    seasonal_factor <- c();
    for (i in seq(n)) {
    v = mean(remaining[seq(i,l,n)])
    seasonal_factor[i] <- v[1]
    }
    sf <- ts$trend[seq(l+1,l+n)]*seasonal_factor;
    sf <- exp(sf);
    actualrevenue <- exp(LREVENUE);
    actualrev <- actualrevenue[seq(n+1, l)];
    rev <- append(actualrev, sf);
    rev

    • Sunny says:

      I was trying to fix the issue for the past week and I was not able to find the reason, could you please help me ?

      Thanks in Advance!

  13. Gana says:

    Hi Bora,

    Is there any way I can display summary table based on forecast ? Above example does quarterly forecast, I need yearly summary below the line chart in a dashboard view.

    Thanks in advance !

    Regards,
    Gana

      • Gana says:

        I need the table to be aggregated one level above. The forecast numbers are at quarterly level, need the numbers in table at yearly level. The moment I change the column ‘Shifted for domain completion’ to ‘Year’ it throws error, which is expected due to the nature of R code. Wondering if there are any workarounds to get yearly summary based on quarterly projection…..

  14. Gopal says:

    Hello Bora.., Thanks for quick response. sorry i couldn’t make the point of the statement can you please elaborate..?
    My attempt, for example, in SCRIPT_* function, using hist(x) function of R to plot it on tableau, but it is not happening…

  15. Phillip Lowe says:

    Hi Bora, I really like this blog!
    It’s a great breakdown of how to integrate R with Tableau and create a usable forecast Measure in Tableau. This is a great improvement to the Forecasting that is in the Analytics section as you can actually use the Measure in other calculations.
    I was wondering if you could help me get the append/densification part to work.
    I have used the same script as in your example but in both the 1st and 2nd append scripts I receive the error ‘result returned by SCRIPT function is of unexpected type’.
    I’ve tried a few different variations of the script, but it doesn’t seem to work.

    I also think that in the code the ‘l’ look for length(.arg1) looks a lot like a 1, maybe you could rename it len<- length(.arg1)

    Thanks, Phil

      • ina says:

        Hello,how can i make my forecast display monthly values for the next 5 year?
        I tried making the frequency value into 1/12 but still, it is displaying yearly values.. How can I fix that?

  16. vinay says:

    Hello Bora,
    Need your help, please let me know where I am facing problem, trying very hard to find out this error.

    “Error : Expected 875 values 874 values were returned” this similar type error always facing while returning values from R script not able to figure out any issue .Please help here is the attached code.

    SCRIPT_REAL(”

    library(forecast);
    arg1<-na.omit(.arg1);
    time <- ts(arg1,start=c(2013,1),frequency=1);
    fcast<-forecast(time,h=.arg2[1]);
    fit <- auto.arima(time)
    f<-forecast(fit, 2)
    n<-length(arg1);
    #append(arg1[(.arg2[1]+1):n],fcast$mean,after=(.arg2[1]))

    append(arg1[(.arg2[1]+1):n],f$mean,after=(.arg2[1]))

    ",
    COUNTD([Bug ID]),[Periods to forecast])

    • Please follow the example in the workbook. Based on this script it is hard to tell what is going wrong since I can’t see the data but it looks like you took the example script and modified parts of it.

      For example you forecast out 2 periods instead of .arg2[1] periods. If .arg2 does not equal 2 this would cause an error.

      f<-forecast(fit, 2)

      Then you're supposed to append it after n-.arg2[1] but instead you're appending at .arg2[1].

      append(arg1[(.arg2[1]+1):n],f$mean,after=(.arg2[1]))

      If you just follow the example this should work without any problems.

  17. vinay says:

    Hi Bora,
    Thank you so much for your help:) It really worked out with your example. Now at least we are able to see forecast lines in Tableau. But we are facing the problem where forecasting is happening only till the year which is available in data, i.e it is showing only till 2017 ,even though we change period of forecast the forecasting is happening for previous years , Ex : forecast period=2 , its forecasting for 2016,2017…forecast period=3 , its forecasting for 2015,2016,2017 but not 2018,2019 not able to see 2018/2019 in plot of Tableau. It would of great help if you can suggest. Once again a loads of Thanks for help.

    Thanks,
    Vinay Haritsa

  18. Vinay says:

    Hi Bora,
    Thank you for your help and response. As you mentioned your example covers everything, by analysing calculative field shifteddate I analysed my issue can be resolved. Its great your guidance helped a lot, Now we are figuring out that our predictions is flat line ? its not incresing or decreasing . We are working on the same.Thank you once again.

    Thanks,
    Vinay Haritsa

  19. Mélanie says:

    Hello, thank you for your example, I’m having the same problem as Vinay the forecasting is happening only till the year which is available in my source data, I looked to your workbook and tried to understand and I don’t know if I really do. I added a shifted date from 2012 to 2020 but the forecasting looks like flat line and begin from 2014 however i have dates until 2018, besides the actual shows only one point. Does anyone have an idea ?
    Thank you

  20. Mélanie says:

    I solved the problem by using the calculated field shifted date and replacing the third parameter of the function DATEADD by DATETRUNC(‘quarter’,[Date]) to trunc the quarter from the existing date on my source.

  21. Milan says:

    Hello Bora,
    I used this script in the Tableau:
    SCRIPT_REAL(“library(forecast);
    time <- ts(.arg1,start=c(2013,1), frequency=24);
    fcast <- forecast(time, h=.arg2[1]);
    n<-length(.arg1);
    append(.arg1[(.arg2[1]+1):n],fcast$mean, after = n-.arg2[1])",
    SUM([Number of Records]),[Periods to Forecast])

    but always have the problem with current month, e.g. now with May. For my purpose it should be forecasted until June. I have changed frequency to =24 as I don't have any forecast with frequency=12.
    So, what should I have to change in the script to get appropriate forecast?
    Or maybe, the better question would be, how to exclude current month (May) from Actuals and put it in Forecast?

    Many thanks!

    P.S. Don't know how to post the picture to see what's going on with the forecast.

    • Frequency 12 indicates a yearly (12 month cycle). If you enter 24 that means it is daily data (24 hour). If you want to forecast 24 months out you need to update the value of the parameter [Periods to Forecast]. Deltat is an alternative way of doing this which is what I used in my examples but yields the same result e.g. you can do deltat=1/4 instead of frequency=4 for quarterly data.
      If you download the sample workbook, I think you will find the answer in the sheet that shows how to shift dates for domain completion. The code is slightly different

      and you will need to use a date field in the opposing axis that shifts the last date you have N periods out (e.g. 24 months in your case). This way you will see the dates extending into the future while actuals remain untouched.

  22. Milan says:

    I am trying to figure out these scripts and everything else regarding forecasting with R but I am still beginner and can’t catch everything at this moment. Still don’t understand some things.
    I have tried to implement your whole example from https://boraberan.wordpress.com/2014/01/19/using-r-forecasting-packages-from-tableau/ into my Tableau but, again, without success.
    My date has format ” 1/8/2015 5:35:00 PM” and I suppose this could be one of the problem that I am facing right now.
    I have used your script:
    SCRIPT_REAL(“library(forecast);jjearnts <- ts(.arg1,deltat=1/12,start=c(2013,1));fcast <- forecast(jjearnts, h=.arg2[1]);
    n<-length(.arg1); append(.arg1[(.arg2[1]+1):n],fcast$mean, after = n-.arg2[1])",SUM([Number of Records]),[Periods to Forecast])

    and created calculated field for date: DATEADD('month', [Periods to Forecast], [Entered Date Time]) because I don't have quarter in my database.

    also, created field for Pred_vs_Actuals: IF INDEX() < SIZE() – [Periods to Forecast]
    THEN 'Actual'
    ELSE 'Forecasted'
    END

    and when I put this in the Tableau I got FLAT line of forecast…
    Btw, what does mean "jjearnts" in the script?

    Many thanks.

    • Sample data is earnings for Johnson&Johnson, the data is converted to an R time series so I named it jjearnts. The name is not important. Date format is also not important since code doesn’t pass dates to R. It just tells R the date of the first point e.g. 1st month of 2013 and distance between points e.g. month (1/12) meaning 1st point will be Jan 2013, second point will be Feb 2013. R itself is generating a time series object based on this information as opposed to getting dates from Tableau. No error but flat line could mean that R didn’t find a good fitting model for your data.

      Did you try Tableau’s forecasting and see how it does?

  23. Milan says:

    Yes, I tried Tableau’s forecast models (Automatic, None-None, None-Multiplicative etc.) and they work very well. Maybe I don’t have enough long time series for this as I have data from 2013.
    It’s not a problem to use Tableau’s forecast but I would like to create one, the best forecast model otherwise I have to select between Tableau models every time and it takes me a lot of time.

    Thank you very much Bora!

  24. Milan says:

    Yes, that’s true, but automatic model doesn’t give me any seasonality as I need it for my purpose.
    Thank you very much Bora for help.

    • What do you expect the seasonal cycle to be in your data?

      Tableau looks for common cycles in data where it recognizes the time domain e.g. it would look for 12 month cycles in monthly data, 24 hour cycles in hourly data etc.

      If you convert the axis from date to integer it would search for any possible random seasonal cycle e.g. do a datediff to convert date to months since the first date etc. instead of using month of date.

      You might want to give it a shot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s