Since version 8.0 it is very easy to generate forecasts in Tableau using exponential smoothing. But in some cases you may want to enrich your forecasts with external variables. For example you may have the government’s forecast for population growth, your own hiring plans, upcoming holidays*, planned marketing activities… which could all have varying levels of impact on your forecasts. Or sometimes you may just want to simulate different scenarios and ask “what if” questions e.g. what would my sales look like if I hired 10 more sales representatives? If interest rates went up by X percent, how would that impact my profits?

Let’s see how we can tackle both uses cases with the help of Autoregressive Integrated Moving Average with eXogenous variables (ARIMAX) models in R’s forecast package.

**In some cases seasonality may be sufficient to capture weekly cycles but not for moving events like Easter, Chinese New Year, Ramadan, Thanksgiving, Labor day etc.*

**Handling special events**

In the image below the observed/historical demand is shown in blue. At the bottom you can see the holidays as green bars with height of 1 and non-holidays as bars of height 0. This representation is often referred to as dummy encoding.

Our data contains 2 holidays that happened in the past and 1 upcoming holiday. One can clearly see that the holidays are causing noticeable spikes in demand. For a store owner who doesn’t want to miss the next holiday opportunity by running out of stock early, it is very valuable to incorporate this important piece of information in the demand forecast.

The formula for the forecast shown with the red line (which doesn’t take holidays into account) looks like the following:

SCRIPT_REAL("library(forecast); append(.arg1[1:99],forecast(ts(.arg1[1:99], frequency=12), h=21)$mean)",AVG([Demand]))

First 99 points cover the historical data while last 21 are what’s being predicted. The data follows a 12 period cycle. Second line of R code *append*s the predicted values to the reported values to generate the full series.

The formula for the forecast shown with the green line (which incorporates the holidays) looks like the following:

SCRIPT_REAL("library(forecast); data <- data.frame(demand=.arg1,holiday=.arg2); training <-data[1:99,]; model<-auto.arima(training[,1],xreg=training[,2]); append(training[,1],forecast(model,xreg=.arg2[100:120])$mean)", AVG([Demand]), ATTR([Holiday]))

In this script, exogenous regressors are passed to the function using the *xreg* argument. It can be clearly seen that both the spike due to holiday and the regular seasonality in demand are represented by this model. If there are multiple events with potentially different effects (positive vs. negative, different magnitudes etc.), they can be handled using the same method if added as separate variables.

**What-If Analysis**

Now that we covered handling events as additional regressors, let’s talk about we can apply the same methodologies to do what if analysis.

Below you can see three time series; Sales and 2 economic indicators. Assume that the two indicators contain the current best estimates of behavior for next few days but Sales for the same period is unknown however can be estimated using the two economic indicators.

What makes this visualization more interesting is that you can also adjust the value of economic indicators and the time frame these overrides apply. For example in the image below, indicator X has been increased by 15 units (dark blue peak) for the period April 13th and April 25th while indicatory Y has been reduced by 20 units (dark orange dip) for the period April 20th and May 1st. The impact can be clearly seen in the dark green portion of the line in the first chart. By adjusting the parameters in the dashboard one can perform what-if analysis and understand impact of likely future events, best/worst case scenarios etc.

Let’s take a look at the R script that is used to create this view.

SCRIPT_REAL("library(forecast); training <- ts(data.frame(sales=.arg1,indX=.arg2,indY=.arg3)[1:100,],frequency=7); model<-auto.arima(training[,1],xreg=training[,-1]); whatifdata <- data.frame(indX=.arg2,indY=.arg3)[101:120,]; append(training[,1],forecast(model,xreg=whatifdata)$mean)", AVG([Sales]),AVG([Economic Indicator X]),AVG([Economic Indicator Y]))

As you can see, the formula is very similar to earlier examples. In this demo dataset, the first 100 rows are used for model fitting while the last 20 contain the sales forecast as well as the inputs for the sales forecast that are the what-if values defined in Economic indicator X and Y fields as a function of parameter entries.

You can find all of these examples in the Tableau workbook published HERE.

In the sample workbook, I also provided a sheet that compares the ARIMAX result to multiple linear regression to give a better sense of what you’re getting out of applying this particular method.

Pingback: Using your R models for in-database scoring « Bora Beran

I opened the workbook, but I’m not able to adjust the results; I get this error message: it says an error occurred while communicating with an external service. Tableau is unable to connect with the external service. Verify the service is running and that you have access privileges.

Can you send the instructions on how to set up R with the sample workbook?

Thanks Much.

You can find detailed instructions here https://community.tableau.com/docs/DOC-9916

Hello,

Thank you for your post. I have a problem with what if analysis, it shows both predicted and historical data until a same date.

I tried many ways, but still have the same problem. Historical and predicted data ends at the same date 2017 in my case even if i have dates on 2018

Hi Bora,

Great article. I am currently trying to add another economic indicator, by adding indZ=.arg4 to training and whatifdata, but this message would always appear on Tableau, “Error in auto.arima(training[, 1], xreg = training[, -1]) : xreg is rank deficient”

I was wondering how to work around this?

Thank you!

Rank deficiency is an issue with your data e.g. not enough data points, not enough variation in the data, needs scaling (e.g. one column is many orders of magnitude larger than others)… There could be many reasons.

I know you’re no longer at tableau but I was wondering how you handle the error of “unexpected number of results returned by SCRIPT function. Function expected X values; Y values were returned”? It works fine in R, just not in tableau

I am having the same issues. I have monthly data for 4 years and I am forecasting the next 12 months based on a what if analysis using one of the variables as a regressor. But, I am receiving the error

“Unable to complete action

Unexpected number of results returned by SCRIPT function. Function expected 60 values; 72 values were returned.”

Hello Bora,

First Of All thank you for this article, I’m new to tableau and R, I have an issue when I open this example on my laptop, the “what if” worksheet an error occurs ‘Unable to complete action

An error occurred while communicating with the External Service.

Error in forecast.Arima(model, xreg = whatifdata) :

xreg should be a numeric matrix or a numeric vector’ ?

I’m having the same issue. Did you find a solution?

Hello Mouad and Bora,

Is there a solution to the error received when looking at the whatifdata. I also get the error message “Error in forecast.forecast_ARIMA(model, xreg = whatifdata) : xreg should be a numeric matrix or a numeric vector”

Mouad:

I experience the same error. It is a result of the forecast package being updated to force the xreg data to not be a data frame.

This updated code should work for you:

SCRIPT_REAL(”

library(forecast);

library(tseries);

training <- ts(data.frame(sales=.arg1,indX=.arg2,indY=.arg3)[1:100,],frequency=7);

model<-auto.arima(training[,1],xreg=training[,-1]);

# 28 May 2019, Josh Belliveau, forecast package now requires a matrix, not a data frame (as of 21 June 2018)

whatifdataframe <- data.frame(indX=.arg2,indY=.arg3)[101:120,];

whatifdata <- data.matrix(whatifdataframe)

append(training[,1],forecast(model,xreg=whatifdata)$mean)

",AVG([Sales]),AVG([Economic Indicator X]),

AVG([Economic Indicator Y]))

Josh Belliveau

Sales Consultant, Tableau