Tableau Conference 2014

It is time for Tableau Conference again! This year over 5000 customers will be joining us in Seattle at Washington State Convention Center. I will be attending the whole event and would be very interested in hearing from you about Tableau use cases and pain points. 

I will be co-hosting the Advanced Analytics track of Industry Analyst Q&A session Tuesday afternoon and giving a talk in Conference Room  310  between 12:30 – 1:30 PM the same day. If you’re interested in working with time series data in Tableau, we have a focus group on Wednesday between 12:15-1:15 PM you may want to attend. Also on Thursday between 7:30-8:30 AM, I will be co-hosting a focus group on data preparation.

Tableau Conference talk

We have a an exciting lineup, including talks from Neil deGrasse Tyson and Hans Rosling. It will be a great conference. Looking forward to seeing you there!

Creating a Covariance Matrix in Tableau Using Table Calculations

Covariance matrices (aka Variance-Covariance matrix) are commonly used in stock portfolio, risk assessment and multivariate analysis in general. Covariance is a measure of how much two variables change together. If we take the example of stocks, if when one stock has good returns, another one also has strong returns and when it loses value, the other also does, their covariance would be positive. If their behavior is the opposite e.g. when one gains, the other one tends to lose value, covariance would be negative. While the sign of covariance shows the relationship, since it is not normalized like correlation coefficient, the magnitude of the covariance is not easily interpretable.

I had done a correlation matrix example earlier that compared cars based on their properties (mpg, hp etc.)  and received a lot of questions about how it can be applied to time series data. So I thought it would good to put together an example that calculates covariance with the same technique and using stock market data and kill two birds with one stone.

In this example I have 14 years of data for 6 stocks adjusted for splits and dividends.  You can click the image below to view the visualization on Tableau Public and download it if you like so you can take a closer look at how it is built.

Covariance Matrix in Tableau

Each cell displays the covariance of stocks that corresponds to its column and row name e.g. For Apple Inc. and JC Penney covariance is –90.4. Enjoy!

Dynamic Network Graph Layouts in Tableau using R

While history of graph theory goes back to Leonhard Euler’s paper on “Seven Bridges of Königsberg” published in 1736, until the rise social networks, analyzing graphs and use of graphs in visualization has been a niche area mostly used in academic circles.

Well..Now that you have access to all this data, how do you visualize it in Tableau?

In Tableau it is relatively easy to generate a network graph using dual axis charts with line and circle/shape mark types once you know the layout. But when you have a graph probably two important questions are :

  • How to do the layout
  • How to calculate relevant metrics from the graph (e.g. betweenness, closeness centrality, authority scores…)

Especially if your graph is changing over time which is very likely, you would like to do these on the fly. Using the R integration feature, this is a fairly easy task. Below is an animation showing the layout changes as new vertices get added to the graph done using the workbook you can find in this blog post.

Layout changes as new vertices get added

In this example we will be using igraph package which offers a variety of layout options to choose from as well as a handful of functions to calculate some interesting metrics.

To use the mark type Line in Tableau, you need to provide a list of points that defines a path. In this case since we are connecting vertices using straight lines, we only need two points (from/to) shown with order 1 and 2. Since our data is in the structure User : Person A, Retweeted by : Person B, the names will alternate e.g. first point of the line is associated with Person A while the other end Person B. You can see this structure being built in Custom SQL.

Once the structure is in place the calculated field Graph Nodes contains the R code to retrieve X – Y coordinates to display in the chart as well as the metric shown in the tooltip (in this case I used betweenness centrality). Everything is retrieved in a single call from R then decomposed into three parts locally.

However R is expecting the data in a different format than it is in Tableau. Tableau has 2 rows for each edge (from/to). R library on the other hand is expecting the edge list as two columns so we filter out the extra rows as we’re passing them to R first in the section of the script shown below.

mydf <-mydf[(mydf$Order=='1')

Also from R we are getting a list of X – Y coordinates for each vertex (which is the union of the two vectors we are passing to R) while each vertex is linked to multiple other vertices so appear as many more rows in Tableau. So first we merge the coordinates with the unique list of vertex names :

cbind(coords, data.frame(users=V(mygraph)$name));

and then at the end do a join using these names in R to replicate the rows so Tableau gets back two rows to define each edge. The join happens in the following part of the script

c<-join(allusers, c, by = 'users');

I tied the layout algorithm choice to a Tableau parameter to make it easier to try different options. Even though the R script passes weights (number of times Tweeted), not all layout algorithms take this into account. If you want force directed layout and see how these weights/forces affect the results, Fruchterman-Reingold would be the option to use. 

You can download the example workbook from HERE. Enjoy!

What’s new in Tableau 8.2?

As of June 19th we have a new member in Tableau family! Tableau 8.2 brings many new features and is the first Tableau version that also runs natively on Mac.

The new in-house map service means better performance/responsiveness and quality so you can enjoy the new and beautiful map tiles designed in collaboration with Stamen on that retina display ;).

Story Points feature allows you to organize your visualizations into a compelling narrative and unleash the data journalist within.

The new REST API  (not to be confused with the JavaScript API or TDE API) allows you to manage Tableau server through an HTTP interface e.g. retrieve a list of workbooks or data sources, query data sources, retrieve workbook thumbnails and can come in handy if your organization wants to provide its own customized interface over Tableau server. The API also offers common server management tasks which previously were only available through command line.

There are also a number of Tableau server features that I will not get into here such as importing/exporting sites and easy log file access.

Now to my favorite features -I must admit, I may have a slight bias since I worked on these ;)  – and the reason I haven’t blogged in almost 3 months.

The New Connection Experience

Of course one of the first things you will notice when you launch Tableau 8.2 is the new Connection experience. The visual join diagram, data preview window, easy access to data source and extract filters so you can quickly build your data source, preview your data without having to deal with several layers of pop up windows. If the data source you’re connected to is fast enough, you will see the effects of your changes in real time in the preview grid.

You may ask “but where did Initial SQL go?” or “How can I convert my join diagram into Custom SQL?” You can find them conveniently located in the Data menu in the toolbar as well as the referential integrity (join culling) setting.  In 8.2 you can even convert tables in the join diagram into Custom SQL or write SQL yourself then drag more tables into the diagram and join with your custom SQL script!!

What if you don’t like drag-drop or you don’t even need to join anything? Maybe you just have one table and want to use your keyboard to quickly create your connection and move on? If you prefer keyboard, all you need to do is type into the search box, locate your result and hit enter. Tableau will help you navigate from selecting database to selecting a table and finally adding it to your data source. This is especially handy if you have really long table names that has a standard prefix since default is a “contains” search.

You will notice everything in the join diagram appear as pills. You can interact with the, pills to alias the table or access its properties which vary by the data source and type of pill. In the screenshot below we are looking at a text file so clicking the gear icon brought up the menu that allows us to adjust the delimiter, character set etc. For Excel workbooks the list would be much shorter and  for SAP HANA it looks completely different, instead you will find variables/input parameters here. Note that these settings can make a big difference e.g. your locale defines thousand and decimal separators.

Custom Properties dialog

Data grid gives you a preview of the results and also allows you to rename or hide columns using the context menu as well as setting the data type and geographic role for them. You can also do multi-select –> hide to quickly get rid of columns that you don’t need for your analysis.

You will notice that the data grid sorts the columns by table and within each table using the data source order unlike the rest of Tableau data windows so you can see your tables the way they were intended by the person who created them.

Multi-select and hide      Changing data type and geographic role from data grid

One of the hidden features in the new connection experience is automatic data modeling. You will notice that Tableau automatically makes the column names more human readable (e.g. NET_INCOME will become Net Income) and recognize numeric ID columns as discrete dimensions instead of measures.

New Excel and Text Connectors

Up until 8.2 Tableau relied on Microsoft Jet to connect to Excel and Text which has been enjoyed by some Tableau customers but not by many others. If you were at the Tableau Customer Conference last year you probably remember the positive reaction from the crowd when new connectors were announced. Of course one undeniable disadvantage of Jet is the fact that it is PC only. In 8.2 we developed new connectors for text and Excel so we can support these common connection types on Mac as well while addressing the limitations that frustrated many.

You can see the exhaustive list of differences HERE but to name a few major changes, with the new connectors there is no limit on table width (Jet only allowed 255 columns and silently dropped the rest) or the size of the column name (Jet allowed only 64 characters)and type detection is much more accurate  (Jet detects data type based on the first 8 rows of an Excel sheet, new connectors use 10000 rows). The down-side of new connectors is that using them with custom SQL is currently not supported. If you really need to use custom SQL, you can still use Jet on your Windows PC. You will see an “Open with Legacy Connection” option in File Open dialog.

Choosing whether to open Excel/Text files with Jet or new connectors

Microsoft not supporting Jet on the Mac also means Access is not an available connection option in Tableau 8.2 on the Mac.

Note that since Jet is not available on Mac, if you create a legacy connection on Windows and share your workbook with a colleague using Mac, Tableau will automatically attempt to upgrade the connection on the Mac so your colleague can make use of your workbook. However if your legacy Excel/text connection relies on Custom SQL, upgrade will fail.

Reconnecting to Data Sources

You probably ran into the case where you received a workbook from someone that had multiple remote database connections in it and upon opening the workbook, you were bombarded with one username/password dialog after another. With 8.2 this is no longer the case. You will see a widget embedded into the sheets that rely on the sources that have trouble connecting, using which if you want to use that specific worksheet, you can provide the necessary info to connect. If you don’t need that sheet, no pressure.

Non-blocking broken connection experience

SAP HANA Variables and Input Parameters

For SAP HANA, variables and input parameters serve as filters defined in the data source by the admin to limit the access to the data. For example the database admin may want users to first pick a Cost Center before they query for data. In Tableau 8.2 you will be able to take advantage of such sources by providing answers to questions imposed by the admin through the connection experience.

HANA variables and input parameters

Google Analytics and Big Query Improvements

Tableau 8.2 adds support for 62 new Google Analytics dimensions and metrics in addition to custom metrics and custom dimensions and the ability to retrieve data from Mobile properties.

We also introduced a native Big Query driver that is much faster, works on Mac and doesn’t require a separate installer anymore. Since now Tableau directly connects to Big Query service instead of going through a 3rd party ODBC driver, we are able to take advantage of more capabilities offered by Big Query. It also gives us the ability to adapt more quickly to changes and improvements in the API.

I hope you enjoy Tableau 8.2 (Stingray), I am looking forward to your feedback here or on Tableau forums!

Creating Coxcomb Charts in Tableau

Nightingale’s rose is probably one of the most influential visualizations of all time as the tool Florence Nightingale used to convince Queen Victoria about improving hygiene at military hospitals hence saving lives of many thousands of soldiers.

Commonly referred to as coxcomb or polar area charts, roses resemble both pie and bar charts in some respects. The layout resembles a pie chart but each slice has the same angle in coxcombs unlike pies. While radius of the slice is related to quantity being displayed (like height of the bar in bar charts), coxcomb uses the areas of slices.

Coxcombs help make seasonal patterns visible and deemphasize small differences while providing a nicer image, all of which worked well for Nightingale’s use case given her data and the target audience for the charts.  This may or may not apply to every situation so often a bar chart is probably the safer route to go.

I was curious whether anyone had built a coxcomb chart with Tableau before but I didn’t come across any so I decided to make one myself. Following chart shows monthly sales for different departments. You can click on the image to bring up the interactive Tableau Public viz and also download to take a closer look on how it is set up.

Nightingale's rose in Tableau

Since each of the slices represent sales for that particular month and department, you will notice that the entire slice gets highlighted when you make a selection instead of just the band visible in the chart as you’d expect.

Nightingale's rose in Tableau

Each slice is a polygon comprised of 102 points (seemed to give a good, smooth curve) which are not present in the data but generated by taking advantage of densification in Tableau. Enjoy!

Recency, Frequency, Monetary analysis in Tableau

RFM is commonly used in marketing, retail and professional services industries to assess customer value. The general idea behind the analysis can be summarized as

Recency : People who have purchased recently from you are much more likely to respond to a new offer than someone who you haven’t sold to in a long time.

Frequency : People who shop frequently at your store are more likely to respond to new offers than less frequent buyers.

Monetary : People who spend more money at your store are more likely to show interest in new offers.

in descending order of importance. There are a few different ways to calculate this metric but I will use the method outlined HERE with the sample Superstore database that comes with Tableau.

Quintiles needed for the analysis can be calculated using Tableau’s percentile rank function. For recency the formula would look like the following:

Percentile rank for recency

For frequency we can use Number of Records since each purchase is a record in the Superstore dataset or count OrderIDs.

Percentile rank for frequency

And monetary is the simplest of all

Percentile rank for monetary

Now let’s convert them to quintiles. I will just show the calculation for recency here but they’re the same for all of the above.

Converting to quintiles

And finally combine the results into a single score to get the RFM metric :

Adding up to RFM

Attack of the mesh plots

Although it was the first thing it made me think of, this is not MathWorks’ new marketing campaign. No, our planet is not under attack from giant jellyfish from outer space either.

The “3D mesh plots” hovering in the sky over Sydney, Amsterdam , Tampa, Denver, Vancouver… are the artwork of Janet Echelman made from fishing nets.

3D mesh plots in the night sky