Cause & Effect : A different way to explore temporal data in Tableau with R

Whether it is forecasting your quarterly sales or comparing historical data, working with time series data is big part of business analytics. Sometimes patterns are easy to see, while in others they might be elusive especially when comparing different time series with short/frequent cyclical patterns.

A common question could be understanding the impact of a marketing campaign. Did it make a difference at all? What’s the return?

A true randomized treatment-control experiment would be the ideal case for running tests but it is not always practical, if at all possible. So let’s see how we can try to answer this question using Bayesian Structural Time Series models. For this post, I generated a dataset with 3 columns. In the image below, Sales is at the top, with 2 economic indicators below it. Let’s assume these two economic indicators are “Consumer Sentiment” and “Spending on Durable Goods” and that they have been good predictors of my sales and there are no other unaccounted for variables that impact my sales.

Time series data

I know the date I started my marketing campaign. Since time flows in one direction, if it has any impact, it would be after this date. If I can use the data from days prior to the campaign to train my model, I can forecast what would have happened if I had not run the campaign. Then by comparing it with the actual sales data, I can understand what difference it made.

Predicted vs Actual sales

While it was initially hard to visually detect this change, now that I added the “expected” forecast (red line), I can clearly see the bump marketing campaign gave me by looking at the gap between green and red lines. In the workbook, I also added to the tooltip a textual summary describing the statistical significance of the results which the CausalImpact package conveniently generates.

Statistical test results

Let’s also look at the cumulative effect of the marketing campaign.

Cumulative value of the marketing campaign over time

It looks like, we made an extra 748 thousand dollars. By comparing this with the amount we spent on the marketing campaign, we can decide whether this was a good investment.

You can download the sample workbook from HERE. The workbook uses the CausalImpact package which can be installed from GitHub. You can find more info on the installation steps in the notes section at the end of the blog post. One thing to keep in mind, CausalImpact package doesn’t work well with lags e.g. if Consumer Sentiment in January is linked to Sales in July, you’ll have to shift your dates first.

Another common type of time series analysis involves understanding dependency relationships between time series with different lags e.g. could Consumer Sentiment from 3 months ago help me make a better prediction of my Sales next month? A frequently used method for this type of analysis is Granger causality.

Granger causality is not necessarily true causality hence the term is often used as “X Granger-causes Y”. It means the patterns in X are approximately repeated in Y after some lag e.g. 4 months in the example I gave above. As a result of this, past values of X can be used to predict future values of Y.

You might be thinking if X happened 4 months before Y and there is a correlation, how is that not a causal relationship? If both X and Y are driven by a 3rd, unknown process (Z) with different lag, it might look like Granger-causality exists between X and Y while in reality manipulation of X would not change Y, only changes to Z would. The roosters crow before dawn, but they don’t cause the sun to rise.

That being said, lack of true causality does not mean one thing can’t be useful in predicting another. Humans relied on roosters’ built in timers for hundreds of years.

Let’s explore Granger causality with an example. Below you can see two time series : The University of Michigan “Consumer Sentiment Index” and “New One Family Houses Sold” from Federal Reserve Economic Data (FRED) website.

Can consumer sentiment in previous months help better predict future home sales?

Input data for Granger-causality test

There are several packages in R to do this test of course. For this post, I used lmtest package and a few methods from forecast package to estimate the number of differences required to make the sample time series data stationary. You can download the sample workbook from HERE. Based on this analysis, with 4 lags, it appears that Consumer Sentiment unidirectionally Granger-causes Home Sales.

Thurman and Fisher’s 1988 paper “Chickens, Eggs and Causality, or Which Came First?” applies Granger causality to this age-old question concluding that the egg came first. It is a fun, short read with tongue-in-cheek future research directions like potential applications of Granger causality to “He who laughs last, laughs best” and “Pride goeth before destruction and an haughty spirit before a fall.” and proof that journal papers and statistics can be fun. As for more serious reading, Judea Pearl’s “Causality: Models, Reasoning and Inference” is a great resource on the topic of causality.


Installing CausalImpact (bsts is a dependency but at the time of this post the default version you’d get from CRAN still had problems):

install.packages("https://cran.r-project.org/bin/windows/contrib/3.2/bsts_0.6.2.zip",repos = NULL, type = "local")


4 thoughts on “Cause & Effect : A different way to explore temporal data in Tableau with R

  1. This is a great post, the method could be adapted to so many situations in business.
    However, package owners warn that “… the relation between treated series and control series is assumed to be stable during the post-intervention period”.
    But doesn’t intervention itself disrupt this relation by its very nature?
    Is there a more detailed layman’s description of this assumption?

    • They look at post intervention period as a whole when they are evaluating for statistical significance. If the relationship during this period varies e.g. a high peak followed by a long period of insignificant change, the overall test result could be that the effect is not significant vs post intervention period just covering the peak.

  2. Lucie says:

    Hi Beran, thanks for this inspiring blog & workbook. You use in your Tableau iz SUM(value), while in the R code you use AVG(value). In your sample set it does not matter since you have only one value per date. But which one is the correct choice if one has several values per date?
    Thanks! Cheers, Lucie

    • In this case, as you pointed out it doesn’t matter since data is not really aggregated but they should match. I will update the workbook and make them consistent. In this case you’re looking at daily sales so, one would expect transactions to roll up to daily using SUM.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s