In 2010, Lustig et. al from IBM classified business analytics workloads into three major categories in their paper “The Analytics Journey”:

**Descriptive Analytics:** A set of technologies and processes that use data to understand and analyze business performance**Predictive Analytics:** The extensive use of data and mathematical techniques to uncover explanatory and predictive models of business performance representing the inherit relationship between data inputs and outputs/outcomes.**Prescriptive Analytics:** A set of mathematical techniques that computationally determine a set of high-value alternative actions or decisions given a complex set of objectives, requirements, and constraints, with the goal of improving business performance.

You can find the full paper here. For those who don’t have time to read the whole paper, in short the journey starts with the question “what is going on?” moves on to “what is likely to happen in the future?” and finishes with “what can we do to make a certain outcome more likely?”.

When you start talking about constraints and improving performance, it starts sounding awfully similar to mathematical/computational optimization, don’t you think? It should. Because optimization is a major part of it.

Imagine, in your business, you discovered a relationship between profit, frequency of promotions and promotion amount. Given your customer base growth, you can forecast your profit for the next year, and even run what if scenarios by changing these parameters manually and see how they impact the forecast. The more promotions, the more people buy, so you see increased profit because of the volume but probably after a certain point, you get diminishing returns. What is the sweet spot? Prescriptive analytics tools can tell you what combination of these parameters can maximize your profit.

Let’s look at an example of how to do this in Tableau. Here, I have annual stock returns for a number of companies over 20 years. We will be using a metric called Sharpe ratio which characterizes how well the return of an asset compensates the investor for the risk taken. Our goal is to find the portfolio that maximizes Sharpe ratio.

We will use R’s optim function which comes with the base R package so there is no need to install or load any libraries.

Our data is in long format, that is; we have one column for all returns, a column that associates each row with a particular stock and a column with year. Our calculation needs a matrix so we will first `split`

the long table into chunks and then add each chunk as a new column (`cbind`

) next to each other. SharpeRatio function contains the logic for calculating Sharpe Ratio and it takes the percentages of each stock in your portfolio as input. You may have notice the constant 0.05, which is my assumption as the risk-free interest rate, the theoretical rate of return of an investment that an investor would expect from an absolutely risk-free investment. For example US treasury bonds are often considered risk-free. Also some of the optimization constraints are captured in the function. Considering the fact that each stock is associated with a percentage of total portfolio value, these have to be non-negative and add up to 1. We enforce the range of each percentage being between 1 and 0 using an if condition and when condition is not met we introduce a high penalty (`1e7`

) so our optimizer avoids those. Adding up to 1 is enforced by the line ` p<-c(x,1-sum(x))`

. Function takes 1 fewer number than the total number of stocks selected by the user and the missing argument is computed by subtracting the sum of these arguments from 1 hence making sure when everything is added, it sums up to 1.

Next we call our ** optim** function with a vector of zeros as initial values. I used L-BFGS-B as the optimization method and provided an upper/lower limit for ratios once again. You can try a different method like Nelder-Mead and skip the upper/lower limit and still get very similar results. Note that by default optim tries to minimize the result of the provided function. You can pass an argument (

`control = list(fnscale = -1)`

) to have it maximize the goal or simply convert the results of your function. I did the latter. Hence my penalty for stepping out of bounds is a large positive number but my Sharpe Ratio is multiplied by –1.Below is a screenshot showing an optimized portfolio for these 4 particular stocks based on 20 years of historical data.

You can download the workbook HERE. Enjoy!

Hi Bora,

Great blog! I’ve really enjoyed your posts about R and Tableau as I’ve attempted to use the integration myself. I’ve run into a problem that you may have tackled already. I’m running a series of A/B tests, comparing every test group to the control group with a t-test. I’ve written a script to return a p-value for each test group. This works well in R, and I got it to work with the Tableau integration. The problem is that Tableau will only return the p-values; they’re not linked to the test group, despite including test group as a parameter in the script. This means that we cannot be entirely sure that we’re attaching the correct p-value to the correct test-group. It seems to return alphabetically and doesn’t respond when I change the order in which the test groups are displayed. Unfortunately, this makes the tool unusable since I cannot map the R output to the Tableau displayed data.

The data is made up of one row per user per day. Any insight you have would be greatly appreciated. Thank you in advance!

Here is the script:

SCRIPT_REAL(”

test_data<-data.frame(

Date<-.arg1,

TestGroup<-.arg2,

Client<-.arg3,

Client_version<-.arg4,

New_User<-.arg5,

Service<-.arg6,

Class<-.arg7,

First_Date<-.arg8,

Users<-.arg9,

ConvertedUsers<-.arg10)

df<-aggregate(test_data$Users,by=list(test_data$TestGroup),sum)

colnames(df)=c('TestGroup','Users')

df2<-aggregate(test_data$ConvertedUsers,by=list(test_data$TestGroup),sum)

colnames(df2)=c('TestGroup','ConvertedUsers')

dataa<-merge(df,df2,by='TestGroup')

output<-cbind(unlist(lapply(2:length(dataa$TestGroup),function(i) {

x=subset(dataa,TestGroup==TestGroup[1])

y=subset(dataa,TestGroup==TestGroup[i])

x2=as.matrix(c(x$ConvertedUsers,y$ConvertedUsers))

y2=as.matrix(c(x$Users,y$Users))

prop.test(cbind(x2,y2),alternative='two.sided',conf.level=0.95)$p.value

})))

p<-data.frame(rbind(1,output))

colnames(p)='p.value'

pvalue<-cbind(dataa,p)

pvalue$p.value",

ATTR([date_utc]),

ATTR([Testgroup]),

ATTR([client]),

ATTR([client_version]),

ATTR([new_user_flag]),

ATTR([service]),

ATTR([split_test_classifier]),

ATTR([user_client_first_date_flag]),

SUM(users),

SUM(d1r)

)

Tableau returns the last item in your script. In this case pvalue$p.value. If you want to return test groups in the same response, you can concatenate pvalues and test groups into one vector. You can find an example of this in my post about forecasting where I returned both fit and residuals in a single call to R. https://boraberan.wordpress.com/2014/01/19/using-r-forecasting-packages-from-tableau/

Hi,

can u say what does $ p.value mean exactly ??

That’s how you get the p-value for the test http://en.wikipedia.org/wiki/P-value

Test returns multiple results. $ is how you pick the one you want. Dollar sign is used to pick named elements from a list or data.frame in R.

Hi Bora

I am working on a Bayes predictor using R but want to implement it in Tableau e.g. load the model in Tableau and predict different outcomes using this model. my R script is as follows:

library(e1071)

data(iris)

classifier <-naiveBayes (Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width, data = iris)

save(rpartModel, file = "C:/Users/Bayesan modeling/mymodel.rdata")

and in Tableau I am using

SCRIPT_STR('

library(e1071)

data(iris)

load(file = "C:/Users/Bayesan modeling/mymodel.rdata")

newiris <- data.frame(Sepal.Length = .arg1,

Sepal.Width = .arg2,

Petal.Length = .arg3,

Petal.Width = .arg4)

as.character(predict(classifier, newiris[,-5]))',ATTR([Sepal.Length]),ATTR([Sepal.Width]),ATTR([Petal.Length]),ATTR([Petal.Width])

The idea is to get in the end the posterior probability of the predicted dataset as well as the predicted Species i.e. the class in the iris dataset. In R the code is running but in Tableau I am getting an error:

Error in log(sapply(seq_along(attribs), function(v) { : non-numeric argument to mathematical function.

How can I implement this in Tableau? Thanks in advance.

My first guess would be that ATTR([Petal.Length]) etc. are being passed as * to R. Could you put them in a table and make sure, you see actual values in the same setup you use for your R script instead of *?

Thanks Boran indeed * was being passed and have solved the problem!

Hi Bora,

Thanks for all your posting related to R and Tableau. This is really helpful.

I am also trying to apply Naive Bayes predictor for one of my application using Tableau, but i am getting prediction more than 1 which is not correct. When i am applying the same logic using R i am getting the correct result with prediction as 1 or 0.

Below is my code snippet that i ma using in both R and Tableau. Here i am tring to predict for Prod5

R

require(XLConnect)

library(e1071)

library(klaR)

library(caret)

setwd(“C:/R Implementation/”)

wb <- loadWorkbook("Test_Mine.xlsx")

train <- readWorksheet(wb, sheet = "Sheet1", header = TRUE,startRow=1)

train <- subset(train,select = c(Prod1,Prod2,Prod3,Prod4,Prod5))

train[1:5,]

head(train)

summary(train)

model <- naiveBayes(factor(Prod5) ~ .,data = train)

wbtest <- readWorksheet(wb, sheet = "Sheet2", header = TRUE,startRow=1)

test <- subset(wbtest,select = c(Prod1,Prod2,Prod3,TruColor))

predicted<-predict(model,test)

test$prediction <-predict(model,test)

test

model$apriori/sum(model$apriori)

result<-cbind(predicted,test)

write.csv(test,"results.csv")

table(predict(model, train[,-5]), train[,5],dnn=list('predicted','actual'))

Tableau

SCRIPT_INT('

library(e1071);

df <- data.frame(Prod1=.arg1, Prod2=.arg2, Prod3=.arg3, Prod4=.arg4, Prod5s=.arg5);

model <- naiveBayes(as.factor(Prod5) ~.,data = df);

predicted <- predict(model,newdata = df, type = "class")',

sum([Prod1]),sum([Prod2]),sum([Prod3]), sum([Prod4]),sum([Prod5])

)

Can you please help where i am doing wrong and also if am using type = "raw" then it returning exactly double number of records from the source.

Thanks in advance

Killing it with ATVI being a big chunk of your portfolio!

Hey Bora,

This is the script. It works in R, but returns an error in Tableau.

SCRIPT_REAL(prop.test(‘x=c(.arg1,.arg2),n=c(14049,14069))$p.value’,2151,2148)

In R it returns a p-value:

> prop.test(x=c(2168,2141),n=c(14049,14069))$p.value

[1] 0.6303922

In Tableau it returns an error:

‘p’ must have the same length as ‘x’ and ‘n’

Note the function has an optional param p, set to NULL by default. It looks to me like tableau is trying to pass it somehow (even though it’s not defined), and converts it to NA.

Indeed in R, it works when explicitly specifying NULL, but will fial with the same error when giving NA

> prop.test(x=c(2168,2141),n=c(14049,14069),p=NULL)$p.value

[1] 0.6303922

> prop.test(x=c(2168,2141),n=c(14049,14069),p=NA)$p.value

Error in prop.test(x = c(2168, 2141), n = c(14049, 14069), p = NA) :

‘p’ must have the same length as ‘x’ and ‘n’

Can you help?

Thank you,

Elad

Tableau sends everything as a vector. In this case it looks like you’re trying to pass it as a scalar. Can you try replacing .arg1 with .arg1[1] ? Same thing with .arg2

Hi Bora,

I get the following error:

Error in optim(rep(0, .arg3[1] – 1), SharpeRatio, NULL, method = “L-BFGS-B”, : L-BFGS-B needs finite values of ‘fn’

Any ideas?

Kind regards, Martin

My guess would be that you might have NULLs in your data. You can pass your inputs to is.finite function in R which will give you a boolean vector showing FALSE for values that are not finite (including nulls).