Using Custom Shapes and Polygons to draw anything from Directed Graphs to Infographics in Tableau

Custom shapes are great for simplifying the association between marks and what they represent so the viewers can enjoy the visualization without having to constantly check the legend to remind themselves what they’re looking at.  But you can see them being used for various purposes in the wild such as navigation/filter controls in dashboards. Custom polygons provide another great way to draw custom shapes in Tableau although they are almost exclusively used in maps by most.

In this blog post, I will be sharing some rather unusual example use cases for both of these mark types about which I got questions about from many Tableau users via the blog or in person conversations at this year’s Tableau conference.

Before we go on any further, please keep in mind that large and/or asymmetrical shapes can negatively affect the accurate readability of your visualizations since their center (which is where the value on the axis corresponds) may be difficult to pinpoint AND most of the data below (while they may look prettier this way) can be more accurately depicted with basic chart types like the good old bar chart. For most of these examples I relied on the Sample – Superstore dataset even though the titles of the visualizations may say otherwise.  All of these visualizations are published on Tableau public for you to download and explore. You can also find information as comments inside the calculated fields that I created.

Let’s start with using marks to create the customized bar chart you see below.

Bar chart filled with custom marks

In simple bar chart form this would look like the following:

Basic bar chart

Now, how do you make this imaginary Sales data look like a chart that shows # births annually? By making the bars look like little kids holding hands, of course. : )

The core piece is the calculation that generates the evenly spaced marks. In this case I divided SUM(Sales) for each region in Superstore dataset into chunks of 5000 until we span the total amount.

Creating evenly spaced marks

This gets you most of the way and you can actually stop at this point if you don’t care about any more precision than increments of 5000. But what happens if you want to be a bit more precise? By using another calculation to show different marks for fractions of 5000…

Working with fractional icons

In this case, my dataset contained enough rows to accommodate all the marks I needed for my visualization. But what if it didn’t? The next example answers this question, as well as showing a different chart type.

Stacked icons are commonly used when author of the visualization isn’t too worried about exactness conveyed by the chart. For example our next visualization shows tanks in Bacteria and Osterlich armies. I can clearly see Bacteria has the superior army. The difference is large enough that even a coarse visualization like this gets the point across. If I cared to count the tanks in the viz and looked up the population, I could have even found out that it corresponds to 0.46 tanks per capita! : ) 

Napaloni, de grosse peanut, de cheesy ravioli!

In this case since my dataset had fewer rows than the marks shown on the viz, I had to generate rows. To achieve this I used one of the densification tricks I showed earlier in my Coxcomb chart blog post.  You can see the custom SQL query below which adds a new row for each existing row with the value for Sales set to 0.

Custom SQL prep for binning

After doing this, I can right click on my Sales column and select “Create Bins”. I need to set the bin size to the quantity my each icon will represent. In the sample workbook, I created bins of 10,000 (icon says 10^6 tanks but underlying data is still aggregated Superstore). Now if I look at my data, this is what I see (0 rows are highlighted).

Underlying data

There are really two values in my dataset for each Market. But if I drag Sales (bin) into rows or columns shelf, now I can select “Show Missing Values” from pill context menu to tell Tableau also to show me the buckets that contain no data. Bins are normally used to generate histograms and this setting is intended to show/hide empty bins in the histogram. To show me the empty bins, Tableau creates new rows for me. This way my 2 rows for Bacteria become 27 rows which means now I can show 27 marks.

Same method can be used with multiple dimensions as well. The following example demonstrates this using Region and Product Type dimensions.

Stacked icons with multiple dimensions

In this case, the custom SQL query looks slightly different but rest of the process is the same.

Custom SQL prep for binning (two dimensions)

You can do a lot more with custom marks if you are adventurous and prefer prioritizing visually interesting charts at the expense of readability and accuracy. For example you can use a custom mark like a stencil as shown below.

Using stencil/cookie cutter images

In this visualization, marks are overlaid on top of a stacked bar chart. Custom mark (the beer glass) has a hollow center (transparent PNG) and is on the secondary axis where the bottom of the glass is fixed at 0 to achieve the effect.  White, yellow color pattern is used in stacked bar chart to resemble beer and foam. This chart is visually interesting but especially considering lack of axis and the fact that it is unclear whether the quantity is associated with the height or area, it is not easy to interpret.

You can take it even further and use images themselves to mimic bar charts as shown in the following example.

Keyser Söze

I hope that’s enough for the infographics. How about something more exciting, such as directed graphs? You could create a graph in Tableau using a dual axis chart combining lines and circles. But how do you indicate direction?

Directed graph in Tableau using custom marks

The first tab in the Tableau public visualization linked via the image above shows how to achieve this using custom marks. This is done by creating 360 custom marks to cover every 1 degree rotation of arrow and first vertex. A calculated field determines the angle for each edge which is used to decide which rotated arrow to display. Note that since the graph can be dynamic, the domain of the field on marks card (angles) may also change. This may lead to changes in what mark is used for what angle. To avoid this, if you look at the dataset associated with this example you will notice that the complete domain (of all possible angles) is added to provide padding but then “pseudo-filtered out” using Pages shelf so these rows don’t have any effect on the visualization itself.

An even more interesting way to do this to use the polygon mark type and custom render the arrows. This example uses the binning technique we discussed earlier to generate rows to accommodate each arrow (each arrow is made up of 8 points).

Directed graph using custom polygons

XFirstStep and YFirstStep fields draw the arrow and scale it based on the distance between points it will connect. Then rotatedX and rotatedY fields take the arrow and rotate it based on the angle between the two points. Since the arrows are rendered as polygons, they could be modified on the fly. I took it one step further and even parameterized the arrow so you can design your custom arrow! Just use the “Design your arrow” tab : )  Below is an example of different arrows you can create…

Custom arrows

Enjoy!

Tableau Conference 2014

It is time for Tableau Conference again! This year over 5000 customers will be joining us in Seattle at Washington State Convention Center. I will be attending the whole event and would be very interested in hearing from you about Tableau use cases and pain points. 

I will be co-hosting the Advanced Analytics track of Industry Analyst Q&A session Tuesday afternoon and giving a talk in Conference Room  310  between 12:30 – 1:30 PM the same day. If you’re interested in working with time series data in Tableau, we have a focus group on Wednesday between 12:15-1:15 PM you may want to attend. Also on Thursday between 7:30-8:30 AM, I will be co-hosting a focus group on data preparation.

Tableau Conference talk

We have a an exciting lineup, including talks from Neil deGrasse Tyson and Hans Rosling. It will be a great conference. Looking forward to seeing you there!

Creating a Covariance Matrix in Tableau Using Table Calculations

Covariance matrices (aka Variance-Covariance matrix) are commonly used in stock portfolio, risk assessment and multivariate analysis in general. Covariance is a measure of how much two variables change together. If we take the example of stocks, if when one stock has good returns, another one also has strong returns and when it loses value, the other also does, their covariance would be positive. If their behavior is the opposite e.g. when one gains, the other one tends to lose value, covariance would be negative. While the sign of covariance shows the relationship, since it is not normalized like correlation coefficient, the magnitude of the covariance is not easily interpretable.

I had done a correlation matrix example earlier that compared cars based on their properties (mpg, hp etc.)  and received a lot of questions about how it can be applied to time series data. So I thought it would good to put together an example that calculates covariance with the same technique and using stock market data and kill two birds with one stone.

In this example I have 14 years of data for 6 stocks adjusted for splits and dividends.  You can click the image below to view the visualization on Tableau Public and download it if you like so you can take a closer look at how it is built.

Covariance Matrix in Tableau

Each cell displays the covariance of stocks that corresponds to its column and row name e.g. For Apple Inc. and JC Penney covariance is –90.4. Enjoy!

Dynamic Network Graph Layouts in Tableau using R

While history of graph theory goes back to Leonhard Euler’s paper on “Seven Bridges of Königsberg” published in 1736, until the rise social networks, analyzing graphs and use of graphs in visualization has been a niche area mostly used in academic circles.

Well..Now that you have access to all this data, how do you visualize it in Tableau?

In Tableau it is relatively easy to generate a network graph using dual axis charts with line and circle/shape mark types once you know the layout. But when you have a graph probably two important questions are :

  • How to do the layout
  • How to calculate relevant metrics from the graph (e.g. betweenness, closeness centrality, authority scores…)

Especially if your graph is changing over time which is very likely, you would like to do these on the fly. Using the R integration feature, this is a fairly easy task. Below is an animation showing the layout changes as new vertices get added to the graph done using the workbook you can find in this blog post.

Layout changes as new vertices get added

In this example we will be using igraph package which offers a variety of layout options to choose from as well as a handful of functions to calculate some interesting metrics.

To use the mark type Line in Tableau, you need to provide a list of points that defines a path. In this case since we are connecting vertices using straight lines, we only need two points (from/to) shown with order 1 and 2. Since our data is in the structure User : Person A, Retweeted by : Person B, the names will alternate e.g. first point of the line is associated with Person A while the other end Person B. You can see this structure being built in Custom SQL.

Once the structure is in place the calculated field Graph Nodes contains the R code to retrieve X – Y coordinates to display in the chart as well as the metric shown in the tooltip (in this case I used betweenness centrality). Everything is retrieved in a single call from R then decomposed into three parts locally.

However R is expecting the data in a different format than it is in Tableau. Tableau has 2 rows for each edge (from/to). R library on the other hand is expecting the edge list as two columns so we filter out the extra rows as we’re passing them to R first in the section of the script shown below.

mydf <-mydf[(mydf$Order=='1')

Also from R we are getting a list of X – Y coordinates for each vertex (which is the union of the two vectors we are passing to R) while each vertex is linked to multiple other vertices so appear as many more rows in Tableau. So first we merge the coordinates with the unique list of vertex names :

cbind(coords, data.frame(users=V(mygraph)$name));

and then at the end do a join using these names in R to replicate the rows so Tableau gets back two rows to define each edge. The join happens in the following part of the script

c<-join(allusers, c, by = 'users');

I tied the layout algorithm choice to a Tableau parameter to make it easier to try different options. Even though the R script passes weights (number of times Tweeted), not all layout algorithms take this into account. If you want force directed layout and see how these weights/forces affect the results, Fruchterman-Reingold would be the option to use. 

You can download the example workbook from HERE. Enjoy!

What’s new in Tableau 8.2?

As of June 19th we have a new member in Tableau family! Tableau 8.2 brings many new features and is the first Tableau version that also runs natively on Mac.

The new in-house map service means better performance/responsiveness and quality so you can enjoy the new and beautiful map tiles designed in collaboration with Stamen on that retina display ;).

Story Points feature allows you to organize your visualizations into a compelling narrative and unleash the data journalist within.

The new REST API  (not to be confused with the JavaScript API or TDE API) allows you to manage Tableau server through an HTTP interface e.g. retrieve a list of workbooks or data sources, query data sources, retrieve workbook thumbnails and can come in handy if your organization wants to provide its own customized interface over Tableau server. The API also offers common server management tasks which previously were only available through command line.

There are also a number of Tableau server features that I will not get into here such as importing/exporting sites and easy log file access.

Now to my favorite features -I must admit, I may have a slight bias since I worked on these ;)  – and the reason I haven’t blogged in almost 3 months.

The New Connection Experience

Of course one of the first things you will notice when you launch Tableau 8.2 is the new Connection experience. The visual join diagram, data preview window, easy access to data source and extract filters so you can quickly build your data source, preview your data without having to deal with several layers of pop up windows. If the data source you’re connected to is fast enough, you will see the effects of your changes in real time in the preview grid.

You may ask “but where did Initial SQL go?” or “How can I convert my join diagram into Custom SQL?” You can find them conveniently located in the Data menu in the toolbar as well as the referential integrity (join culling) setting.  In 8.2 you can even convert tables in the join diagram into Custom SQL or write SQL yourself then drag more tables into the diagram and join with your custom SQL script!!

What if you don’t like drag-drop or you don’t even need to join anything? Maybe you just have one table and want to use your keyboard to quickly create your connection and move on? If you prefer keyboard, all you need to do is type into the search box, locate your result and hit enter. Tableau will help you navigate from selecting database to selecting a table and finally adding it to your data source. This is especially handy if you have really long table names that has a standard prefix since default is a “contains” search.

You will notice everything in the join diagram appear as pills. You can interact with the, pills to alias the table or access its properties which vary by the data source and type of pill. In the screenshot below we are looking at a text file so clicking the gear icon brought up the menu that allows us to adjust the delimiter, character set etc. For Excel workbooks the list would be much shorter and  for SAP HANA it looks completely different, instead you will find variables/input parameters here. Note that these settings can make a big difference e.g. your locale defines thousand and decimal separators.

Custom Properties dialog

Data grid gives you a preview of the results and also allows you to rename or hide columns using the context menu as well as setting the data type and geographic role for them. You can also do multi-select –> hide to quickly get rid of columns that you don’t need for your analysis.

You will notice that the data grid sorts the columns by table and within each table using the data source order unlike the rest of Tableau data windows so you can see your tables the way they were intended by the person who created them.

Multi-select and hide      Changing data type and geographic role from data grid

One of the hidden features in the new connection experience is automatic data modeling. You will notice that Tableau automatically makes the column names more human readable (e.g. NET_INCOME will become Net Income) and recognize numeric ID columns as discrete dimensions instead of measures.

New Excel and Text Connectors

Up until 8.2 Tableau relied on Microsoft Jet to connect to Excel and Text which has been enjoyed by some Tableau customers but not by many others. If you were at the Tableau Customer Conference last year you probably remember the positive reaction from the crowd when new connectors were announced. Of course one undeniable disadvantage of Jet is the fact that it is PC only. In 8.2 we developed new connectors for text and Excel so we can support these common connection types on Mac as well while addressing the limitations that frustrated many.

You can see the exhaustive list of differences HERE but to name a few major changes, with the new connectors there is no limit on table width (Jet only allowed 255 columns and silently dropped the rest) or the size of the column name (Jet allowed only 64 characters)and type detection is much more accurate  (Jet detects data type based on the first 8 rows of an Excel sheet, new connectors use 10000 rows). The down-side of new connectors is that using them with custom SQL is currently not supported. If you really need to use custom SQL, you can still use Jet on your Windows PC. You will see an “Open with Legacy Connection” option in File Open dialog.

Choosing whether to open Excel/Text files with Jet or new connectors

Microsoft not supporting Jet on the Mac also means Access is not an available connection option in Tableau 8.2 on the Mac.

Note that since Jet is not available on Mac, if you create a legacy connection on Windows and share your workbook with a colleague using Mac, Tableau will automatically attempt to upgrade the connection on the Mac so your colleague can make use of your workbook. However if your legacy Excel/text connection relies on Custom SQL, upgrade will fail.

Reconnecting to Data Sources

You probably ran into the case where you received a workbook from someone that had multiple remote database connections in it and upon opening the workbook, you were bombarded with one username/password dialog after another. With 8.2 this is no longer the case. You will see a widget embedded into the sheets that rely on the sources that have trouble connecting, using which if you want to use that specific worksheet, you can provide the necessary info to connect. If you don’t need that sheet, no pressure.

Non-blocking broken connection experience

SAP HANA Variables and Input Parameters

For SAP HANA, variables and input parameters serve as filters defined in the data source by the admin to limit the access to the data. For example the database admin may want users to first pick a Cost Center before they query for data. In Tableau 8.2 you will be able to take advantage of such sources by providing answers to questions imposed by the admin through the connection experience.

HANA variables and input parameters

Google Analytics and Big Query Improvements

Tableau 8.2 adds support for 62 new Google Analytics dimensions and metrics in addition to custom metrics and custom dimensions and the ability to retrieve data from Mobile properties.

We also introduced a native Big Query driver that is much faster, works on Mac and doesn’t require a separate installer anymore. Since now Tableau directly connects to Big Query service instead of going through a 3rd party ODBC driver, we are able to take advantage of more capabilities offered by Big Query. It also gives us the ability to adapt more quickly to changes and improvements in the API.

I hope you enjoy Tableau 8.2 (Stingray), I am looking forward to your feedback here or on Tableau forums!

Creating Coxcomb Charts in Tableau

Nightingale’s rose is probably one of the most influential visualizations of all time as the tool Florence Nightingale used to convince Queen Victoria about improving hygiene at military hospitals hence saving lives of many thousands of soldiers.

Commonly referred to as coxcomb or polar area charts, roses resemble both pie and bar charts in some respects. The layout resembles a pie chart but each slice has the same angle in coxcombs unlike pies. While radius of the slice is related to quantity being displayed (like height of the bar in bar charts), coxcomb uses the areas of slices.

Coxcombs help make seasonal patterns visible and deemphasize small differences while providing a nicer image, all of which worked well for Nightingale’s use case given her data and the target audience for the charts.  This may or may not apply to every situation so often a bar chart is probably the safer route to go.

I was curious whether anyone had built a coxcomb chart with Tableau before but I didn’t come across any so I decided to make one myself. Following chart shows monthly sales for different departments. You can click on the image to bring up the interactive Tableau Public viz and also download to take a closer look on how it is set up.

Nightingale's rose in Tableau

Since each of the slices represent sales for that particular month and department, you will notice that the entire slice gets highlighted when you make a selection instead of just the band visible in the chart as you’d expect.

Nightingale's rose in Tableau

Each slice is a polygon comprised of 102 points (seemed to give a good, smooth curve) which are not present in the data but generated by taking advantage of densification in Tableau. Enjoy!

Recency, Frequency, Monetary analysis in Tableau

RFM is commonly used in marketing, retail and professional services industries to assess customer value. The general idea behind the analysis can be summarized as

Recency : People who have purchased recently from you are much more likely to respond to a new offer than someone who you haven’t sold to in a long time.

Frequency : People who shop frequently at your store are more likely to respond to new offers than less frequent buyers.

Monetary : People who spend more money at your store are more likely to show interest in new offers.

in descending order of importance. There are a few different ways to calculate this metric but I will use the method outlined HERE with the sample Superstore database that comes with Tableau.

Quintiles needed for the analysis can be calculated using Tableau’s percentile rank function. For recency the formula would look like the following:

Percentile rank for recency

For frequency we can use Number of Records since each purchase is a record in the Superstore dataset or count OrderIDs.

Percentile rank for frequency

And monetary is the simplest of all

Percentile rank for monetary

Now let’s convert them to quintiles. I will just show the calculation for recency here but they’re the same for all of the above.

Converting to quintiles

And finally combine the results into a single score to get the RFM metric :

Adding up to RFM