What’s new in Tableau 9.0? Part 2 – Level of Detail Expressions

Level of detail expressions extend Tableau’s calculation language by introducing the capability to define at what level aggregations should happen. It is a small syntax change but implications are HUGE!

I will focus on the technical details in this post with a few examples to clarify the concepts. Stay tuned for a series of posts on Tableau website from our Product Consultants which will cover a range of business questions you can answer using level of detail expressions with lots of example workbooks, perfect for some inspiration.

What is level of detail anyways?

In Tableau, the results for a calculation such as SUM([Sales]) depend on the context. The context is defined by the filters and level of detail. Dimensions on rows, columns, color, size, label, detail or path shelves define the level of detail for the sheet.

For example if [State] dimension is on Row shelf, SUM([Sales]) will give sum of all transactions for each [State]. If [Product Type] is also on one of the shelves I mentioned above then SUM([Sales]) will give sum of all transactions within each [State] for each [Product Type]. The more dimensions in your sheet or the more unique members each dimension contains, the more granular your results will be. Since each result as drawn as a mark in your visualization, the finer the level of detail for your sheet, the more marks you will have.

Below is an example using the Coffee Chain dataset. The image on the left has Product Type as the only dimension in the sheet while the one on the right also has Product, giving it finer granularity hence more marks.

Product type vs Product, coarser and finer level of detail respectively

Filters on the other hand change the context by reducing the number of data points used in the calculation e.g. remove a state by name or where profit is below zero or that are not in TOP 5 etc. Using table calculations as filters is an exception to this however since they just hide marks without changing the data over which the calculations operate.

Level of Detail Expressions

Level of detail expressions allow you to specify the context for your calculations. For example the following expression

{fixed [State] : SUM([Sales])}

will always calculate sum of sales per state regardless of sheet’s level of detail. It also ignores the filters in effect except context filters, data source and extract filters.

Tableau has had several mechanisms to calculate at a coarser level of detail than the sheet or perform multiple levels of aggregation for many releases. Reference lines, totals and table calculations have been commonly used methods for these tasks which worked very well in some cases but not others. What level of detail expressions bring to the table is the ability to use the results mixed with row level values, push the mixed and multi-level aggregate calculations to the database (as opposed to local calculations with other methods) and use results as dimensions, for table calculation addressing/partitioning, in actions or bin them.

Level of detail expressions have the following structure

{keyword [dimension1],[dimension2]… : Aggregate expression}

The portion highlighted in blue is the dimensionality expression which consists of 0 or more dimensions. If there is more than one dimension, they must be separated with commas. Aggregate expression does not necessarily have to be a simple aggregation e.g. SUM([Sales]) you can insert other calculations here, even wrap an IF block inside a level of detail expression.

There are 3 different keywords

Fixed : Limits the level of detail of the aggregate expression to only the dimensions listed in the dimensionality expression. I will cover the interaction with filters towards the end of this post.
Include : Defines the level of detail for the aggregate expression by combining the dimensions listed in the dimensionality expression with dimensions in the sheet. It is great when you want to calculate at a fine level of detail in the database then re-aggregate and show at a coarser level of detail in your visualization and/or when you want your calculation’s level of detail change when you drill down or add more dimensions to your Sheet.
Exclude : Defines the level of detail for the aggregate expression by removing the dimensions listed in the dimensionality expression from dimensions in the sheet. It is great for ‘percent of total’ or ‘difference from overall average’ scenarios. Can be considered as the written/textual representation of features like Totals and Reference Lines. 

Specifying no dimensions using fixed keyword e.g. {fixed : MIN([Order Date])} gives a single result for the entire table. Shorthand for {fixed : MIN([Order Date])} is {MIN([Order Date])}. This gives you the date your store made its first sale, when you earned that dollar bill you got framed :)

Since level of detail of the sheet determines the number of marks drawn in the visualization, when your calculation has a different level of detail, something needs to be done to reconcile the difference.

Here is what Tableau will do automatically for you.

If your calculation is of coarser level of detail, Tableau will replicate the results as needed  such that there is a value for each mark. In the example below

{fixed [Segment] : SUM([Sales])} is getting replicated for each Category, Segment.

Replication behavior when calculation has coarser level of detail than the sheet

This allows you to the following to calculate the difference between a Transaction Amount and the AVG Transaction Amount by that customer.

[Transaction Amount] – {fixed [Customer ID] : AVG([Transaction Amount])}

Note that this calculation mixes an aggregate with a row level value but doesn’t give you the error you’re used to seeing when you have aggregates and non-aggregates in the same calculation. Yet another benefit of using level of detail expressions!

If your calculation has finer level of detail, Tableau will aggregate the results as needed  such that there is only one value for each mark. In the example below

{fixed [Segment], [Category] : SUM([Sales])} is getting AVG’ed up to one value per segment (assume that the pill in the sheet has AVG as aggregation).

Aggregation behavior when calculation has finer level of detail than the sheet

With include keyword calculation will always either have same or finer level of detail than the sheet which requires aggregation to match sheet’s level of detail. For fixed it can be coarser or finer depending on the calculation and dimensions in the sheet. Since the need to aggregate or not depends on what dimensions are in sheet, it can change anytime. Having an aggregation defined is safe since if and when aggregation is needed, you already told Tableau how to do it. Because of all this, calculations with custom level of detail are always wrapped in an aggregate when in the sheet unless they’re used as dimensions. If no aggregation is happening (e.g. calculation has the same level of detail as the sheet or is coarser hence getting replicated instead) then changing the aggregation will not make any difference in the results. You will notice that when you drag a pill that is an exclude calculation, Tableau will default to ATTR (as opposed to SUM or AVG as you’re used to see with measures) to hint that what’s happening isn’t aggregation.

How does it work?

Level of detail expressions rely on database queries to calculate the results. If you look at your Tableau logs, you will notice them as sub-queries with inner joins and sometimes with cross joins. If your database doesn’t have a CROSS JOIN operator, Tableau will join without an ON/WHERE clause or use one that always returns true to get the same effect as long as your database supports it.

Cross-joins happen when there is no common dimension to join on. For example {MIN([Order Date])} is a single value repeated for every row hence there is no need for a join key.  It is a cross-join between the main table (that defines the sheet) and a one-row table (resulting from the calculation).

Of course this is not the only case. For example you can have a calculation such as {exclude [State]  : AVG([Sales])} inside a Sheet where [State] is the only dimension. This would also result in “blank” dimensionality again hence a cross-join query will be generated.

Level of Detail Expressions and Filters

There are several different kinds of filters in Tableau and they get executed in the following order from top to bottom.

Filters in Tableau

I colored Extract Filters and Table Calculation filters differently since former is only relevant if you’re creating a Tableau Extract from a data source and latter applies after the calculations are executed hence hides the marks without filtering out the underlying data used in calculations.

If you’re familiar with SQL, you can think of measure filters as HAVING clause in a query and dimension filters as the WHERE clause.

Fixed calculations happen before dimension filters, as a result unless you promote the pills in your filter shelf to “Context Filters” they will be ignored. There are many benefits to this. Imagine writing the following calculation

SUM([Sales]) / ATTR({SUM([[Sales])})

and having the dimension [State] in your Sheet. With this setup, this calculation will give you ratio of a State’s sales to total Sales. You can add a filter to your Sheet to hide some of the States but filter will only affect the numerator. Since denominator is fixed, it will it will still give the sum for all States. When you also consider taking advantage of this with filter actions in dashboards, you can imagine how powerful it can be.

Include and Exclude calculations happen after Dimension filters like any other (except fixed) measure in Tableau. So if you want filters to apply to your level of detail expression but don’t want to use Context Filters, you can rewrite your calculation using Exclude or Include keywords.

Level of Detail Expressions and Totals

There were several questions about this on Tableau forums and Beta e-mail alias so I decided to add a section to the post to cover this topic as well.

First, let’s talk about how Tableau computes Totals. In Tableau there are two kinds of totals. Totals (aka single-pass totals) and Visual Totals (aka two-pass totals). Former applies the current aggregation at a coarser level of detail than the Sheet and is the default setting when you turn on Totals. This option appears in the Total Using menu as “Automatic”.

Below is an example of single-pass Totals where you can see that subtotals in the first image on the left match AVG(Sales) per Category  (upper right), while Grand Total matches the AVG(Sales) for the entire table (in the lower right) i.e. when no dimensions are in the Sheet.

 Totals (aka single-pass totals)

If you wanted the Total rows to show the AVG of the rows above them e.g. the average of the values displayed for Bookcases, Chairs, Furnishings and Tables then you would need to use Visual (aka two-pass) totals. You can switch to visual totals by changing “Total Using” setting from “Automatic” to the desired aggregation to be used in second pass.
 
Note that values for single pass and two pass will be the same when what you’re doing is sum (“Total Using” setting) of sum (pill’s aggregation), max of max or min of min.
 
In the example below, Total Using is set to AVG. So subtotals show the AVG of SUM of sales per Sub-category within each Category. Green box shows the result while green line indicates the values that were averaged to get that result. Similarly the Grand Total line at the bottom (blue box) shows the AVG of all the rows marked with blue line. I renamed the total cells in this case so they are more descriptive of what’s happening.

Visual (aka two-pass) totals

Now that we clarified the basics, let’s move onto the main question of how totals and level of detail expressions play together.

Even though a single pass grand total applies the aggregation at table granularity i.e. as if no dimensions were in the table this does not affect what happens inside the curly braces.

For example if your visualization contains the dimension [Product Category] and your calculation is
AVG({include  [Customer Name]:SUM([Sales])})

LOD expression will still calculate at [Customer Name] , [Product Category] level for the grand total. Total row calculation only changes how the outer aggregate behaves (AVG in this case) so the results of the LOD calculation would get averaged up to [Product Category] for each mark in the Sheet but to table level (single value) for the grand total.

Single-pass Totals are not affected by the replication that is triggered by LOD expressions coarser than the Sheet. However please keep in mind that Visual (aka two-pass) totals do.

Understanding Nesting

You can nest several layers of level of detail calculations to answer even more elaborate business questions.  One thing to keep in mind though is the fact that context for a calculation is defined by its parent(s) as opposed to the sheet. For example :

{fixed [State] : AVG({include [Customer] : SUM([Sales])})}

it will have the same effect as writing

{fixed : AVG({fixed [State], [Customer] : SUM([Sales])})}

since nested calculation inherits the dimensionality from the outer calculation in the first case resulting in [State] + [Customer] = [State], [Customer]. Also since outer calculation is fixed, nested calculation will not be affected by the filters in the sheet.

And that’s not just it. Aggregation/replication behavior I explained earlier in this post also applies to nested calculations. Let’s walk through an example…

Assume you are trying to calculate the average customer spending in each State but you realize that customers travelling and spending small amounts in multiple states are skewing the results and instead of filtering those out, you decide to calculate the total spending for each Customer and use that value towards the State average. You could do this by writing the following calculation

{fixed [State], [Customer] : AVG({exclude [State] : SUM([Spending])})}

And set the pill’s aggregation to AVG when you drag it into your sheet.

In this example, nested calculation will give the SUM of [Sales] per [Customer]. However the outer aggregation has the dimensionality of State, Customer which is finer. This will result in replication of total sales per Customer for each State in which customer ever made a purchase.

Anatomy of a nested level of detail calculation

In this case since what happens between the two levels of calculations is replication, what aggregation you use here (AVG({exclude [State] : ….) doesn’t matter. The final step that averages customer totals to State level happens through the pill’s aggregation setting and the fact that State is the only dimension in your Sheet.

While this is a good example for explaining how nesting works there is even a simpler way of writing this calculation :

{fixed [Customer] : SUM([Spending])}

You just need to drag it into your sheet with the dimension State and set the pill’s aggregation to AVG.

This is because when Tableau generates this INNER JOIN, it does so between the dimensionality of the level of detail (LOD) expression and dimensionality that is the union of  the sheet and the LOD expression’s dimensionalities. In this case, the union of dimensionalities is Customer (from the calculation) + State (from the Sheet) which will result in the same replication behavior as in the nested case.

Working with disjoint LODs

When you think of all that’s happening to calculate the result, syntax to do it is amazingly simple.

Limitations

  • Calculations that involve level of detail expressions won’t be materialized in Tableau Extracts in  9.0.
  • Sets, combined-fields (use dimensions that make up the combined field individually by separating them with commas instead), parameters, table calculations cannot be used in the dimensionality expression in 9.0
  • Only field names can be used in dimensionality expression (e.g. an expression like MAX([Order Date]) or YEAR([Order Date]) or an IF statement cannot be used in the dimensionality expression). Note that you can use a calculated field that has the formula YEAR([Order Date]) and use that instead. If you use {exclude [Order Date] : SUM([Sales])} and have [Order Date] in your sheet, Tableau will recognize the date hierarchy and automatically exclude the dateparts from calculation’s level of detail as you drill down.
  • Table calculations and ATTR are not allowed in aggregate expression e.g. {fixed [County] : ATTR([State])} or {fixed [County] : INDEX()} are not supported
  • Dimensionality from one source with measure from another blended source can not be mixed in a level of detail calculation
  • All dimensions in the dimensionality expression should come from the same data source
  • Level of detail expressions that use exclude or include keywords can not be used as dimensions. They also can not be binned. This is because they have relative dimensionality. While this may appear like an arbitrary limitation, it makes them much more understandable. Consider the case where there are two include calculations in the Sheet that are used as dimensions. Calculation1 is {include x : min(i)}, Calculation2 is {include y : min(k)}. If there are no other dimensions in the Sheet, Calculation1 will behave like {fixed x, Calculation2 : min(i)} and Calculation2 will behave like {fixed y, Calculation1 : min(k)}. Circular reference! This is a solvable problem but at the expense of understandability. We chose ease of use and simplicity instead.
  • Level of detail calculations from secondary sources can not be used via data blending 2 in 9.0
  • In 9.0 results from Level of detail calculations are cached as a whole (not broken down per sub-query) so generating same sub-queries in different sheets/calculations won’t result in cache hits.

Data source support

In Tableau 9.0, level of detail expressions are not supported for the following data sources: Cubes, Big Query, DataStax, Informatica Data Services, Microsoft Jet, Splunk, Actian Vectorwise.

Following data sources has higher version requirements for level of detail calculations than minimum supported versions per Tableau technical specifications here :

Teradata 12+, Netezza 7+, HIVE 0.13+, Impala 1.2.2+

Rest of the data sources support level of detail calculations with the same minimum version requirements as listed in technical specs. The list also includes the new connections not yet listed in technical spec such as Apache Spark, IBM BigInsights and SPSS, R, SAS data files which adds up to a total of 36 connection types! Note that the new Excel and Text connectors are not Jet based so they have full level of detail calculation support as well.

Generic ODBC has limited support as cross-join syntax varies from database to database. What Tableau issues through generic ODBC may not be what’s supported by the database or database may not support cross-joins at all. So some scenarios may not work. It is advised that you use dedicated connection types where available for the best experience.

If you’re connecting through Tableau Data Server, it also has to be Tableau 9.0 or newer.

Performance
Level of detail expressions rely on database queries to calculate the results. Performance heavily depends on your database and size of the data, whether necessary indexes are in the database to be able to perform joins rapidly, complexity and level of nesting in your calculation. If your dataset is large and database is fast, level of detail expressions can give you vast performance improvements over Table Calculations since you can push finer detail calculations directly into the database instead of moving a large dataset over the wire into Tableau on your desktop.

You can make your calculations run more efficiently on the database, by avoiding unnecessary cross-joins. Also if you have large tables, you can get performance gains by using MIN instead of ATTR when you know replication will happen. For ATTR, Tableau queries for both MIN and MAX to see if it needs to show * or not. If you know your calculation will replicate, you already know MIN will be same as MAX. This way you can get the same results and shave-off a sub-query.

Other Notes

When using level of detail calculations as dimensions, be careful if your aggregation is returning floats since floating point arithmetic does not give results that are reliable for equality comparisons (JOINs check for equality) and this may result in unexpected results.

Conclusion

Congratulations! You made it to the end of the post!  I agree it was a long post but when you think about it, I think it is pretty cool that it was possible to fit most technical details in a single blog post. This would probably be a 3 volume book that comes in hardcover in the old world of business intelligence and calculations would only work on one platform. Being able to do this with 30+ different databases with such little visible change in the calculation language is very exciting.

Do you know what’s more impressive and excites me even more? All the cool things you will do with it… All the insights you will discover… Enjoy!

What’s new in Tableau 9.0? Part 1 – Data Preparation

Tableau 9.0 is loaded with features that will make it much easier to get your data ready for analysis. In my first post of “What’s New?” series for Tableau 9.0, I will start with Data Preparation.

Reading Loosely Structured Excel files

If you’re used to dealing with messy Excel spreadsheets, first thing you’ll notice is the “data interpreter”. Data interpreter is a nifty tool. If your spreadsheets are perfectly clean, you probably won’t even notice it is there. It will do a first pass through the data as it loads to populate the preview grid and run a light-weight version of the interpreter to see if spreadsheet contains anything that it can fix. If all looks good, it won’t make a peep.

If your data types and column lengths seem to not make sense or you seem to have multi-level headers as in the example below,

A loosely structured Excel spreadsheet

it will offer you help without getting in the way. Look for a gray bar at the bottom of the join diagram that spans its entire width.

Tableau's Excel structure analyzer

Once you turn it on, it starts looking for areas of the sheet that can be converted into well-structured tables.  Many properties of the sheet are used in this process from formatting elements such as color and borders to whether cells are empty or not and if not what their data types are. Based on this information, it will automatically discard the header/footer text and flatten multi-level table headers, remove inline comments… For the example above, you’d get something like this.

The structure table Tableau was able to extract from the Excel spreadsheet

For comparison this is what it would look like with the option turned off. Big improvement!

Results if Excel workbook were to be read as is into Tableau

Of course since it is trying to detect a pattern, there is a chance it will not get everything right even though it was trained on 6000+ Excel workbooks we gathered from public reference data repositories such as Data.gov and UK Office for National Statistics, OECD, WHO, UNESCO, World Bank etc. If any non-text values are dropped, you will promptly see a warning so you don’t miss anything you intended to keep. You can always preview the results in Excel (or other MS Office alternatives you might have on your machine that can open Excel files) by clicking on the preview button. Preview also includes markup indicating what cells were kept (and classified as header vs. data) and what was removed.

Lost data warning

In 9.0 this functionality is only supported for Excel files opened through the new Excel connectors (not supported on legacy Jet connectors).

Working with crosstabs

Now that your data looks like a table, another common problem is that most reference data come as crosstabs since they’re intended to be consumed as reports as opposed to input for further analysis. With 9.0 fear not! Now you can pivot them.

Unpivoting in Tableau

If your data changes and after creating the workbook and you realize you have a new column with one more year of data, you can simply add it to the existing pivot.

Adding new columns to an existing unpivot transform

To avoid potential confusion, it is worth mentioning that even though in the data analysis world this operation is often referred to as pivot, in database terminology it is called unpivot.

In 9.0 pivot is only supported when using the new Excel and Text connectors (not with Jet), calculated fields can not be pivoted and pivoted fields can not be used for joining tables (but they can be used when blending data sources).

Splitting delimited text

I am sure you had to deal with composite strings in the past such as e-mails, phone numbers, URLs. Extracting relevant parts of a string could be a hassle but it doesn’t have to be if you have Tableau 9.0. Smart split analyzes a sample of your data to determine if it can be broken into representative parts. Some well-known patterns such as e-mails and URLs are recognized with their types and each part is named appropriately. For example e-mail will be broken into username and domain.

Splitting E-mail addresses using the categorical field headers in the sheet view

If data can not be associated with a known type, Tableau will still break the string into its parts but each part gets a generic name after its token number. You can initiate this action in many different places in Tableau.

Splitting an unrecognized domain using the preview grid in the Data tab

As you can see, this is not just a split operation.  Tableau actually removes what it determines to be constants (the repeating characters) from the string hence when you apply smart Split on a string like RGB color codes in the form (33,66,99) you not only get 3 numbers but you also get rid of the parentheses.

If you have an irregular pattern and smart split doesn’t succeed you can use “Custom Split” to specify a delimiter, whether to split from left or right and how many times to split. Both split options automatically create calculated fields for you that rely heavily on a new function in the calculation language called SPLIT.

SPLIT function takes 3 arguments. Input string, delimiter and token number. For example if you have customer names in the form John. C. Doe. You can write the following calculation to get the first name

SPLIT([Customer Name], ‘ ‘, 1)

which gives you the first token from the left after splitting on space.

Assuming some people will have middle names and some don’t, if you’d like to get the last name you can write the following

SPLIT([Customer Name], ‘ ‘, -1)

which gives you the first token from the right after splitting on space.

Tableau Data Engine (extracts, new Excel and Text connectors, Salesforce, Google Analytics, OData, Azure DataMarket, SPSS, SAS and R data files), MySQL 4.0+, PostgreSQL 9.0+, Microsoft SQL Server 2008+ support splitting from both left and right. HP Vertica, Oracle, Teradata, Aster nCluster, Google BigQuery, Amazon Redshift, Hadoop Hive (Cloudera, Hortonworks, MapR) support only splitting from left. If a database has a built-in split function, Tableau takes advantage of it, if no split function is available but regex functions are, Tableau uses those. If neither are available, Tableau will rely on simple string manipulation functions in the database to achieve the same result. As the number of splits grow however SQL queries can get large and perform badly. To avoid this except Tableau data engine sources, number of splits are capped at 10.

Regular Expressions

Tableau 9.0 also expands on the Regex support. REGEXP_EXTRACT, REGEXP_EXTRACT_NTH,  REGEXP_MATCH and REGEXP_REPLACE are now supported on PostgreSQL, Oracle (10g+),  Hadoop HIVE & Impala (no _MATCH), Google Big Query and Tableau Data Engine. Data Engine supports RegEx expressions in conformance with  Unicode Technical Standard #18 , Unicode Regular Expressions level 1, and in addition include Default Word boundaries and Name Properties from level 2. Since new Excel/Text connectors rely on extracts behind the scenes they get it for free. Same applies to other extract based sources such as Salesforce, Google Analytics, OData, Azure DataMarket, SPSS, SAS and R data files. And of course any database you explicitly create an extract from will have it.

There are many other improvements in the data preparation experience in Tableau 9.0 such as the metadata grid, which allows you to quickly review your table metadata, the newly designed connection experience, which allows searching the ever growing list of connectivity options and contains the list of recently used data sources.

The new connection experience

Check-out the Tableau blog for a high-level overview of the new Data Preparation features in 9.0 and improvements in other feature areas.

Two new great blog posts from Tableau community

Two of my favorite new Tableau blogs are “Data * Science + R” and “Data + Science”. Former is a great resource if you want to learn how to leverage R inside Tableau by walking through sample use cases. Examples are interesting and very thoroughly explained. This year’s first post was about change-point analysis in Tableau. This was a topic I heard from several of our customers and had been planning to blog about but haven’t had the chance.

Change-point analysis in Tableau

“Data + Science” is another awesome blog. Jeff has been doing some great work with various types of undirected graphs. Yesterday he published a blog post about tree diagrams as a follow up to his beautiful Sankey diagram post from a few months earlier.

Tree diagram and Sankey diagram in Tableau

What I love about Tableau is the expressive power and flexibility of the product which combined with such a great user community leads to something new and beautiful almost everyday not to mention countless learning opportunities and cross-pollination. Take the graph below for example. Olivier Catherin took Jeff’s Sankey Diagram idea as an inspiration and further expanded upon it to create the following graph.

Tree Diagram in Tableau

Radial treemaps & bar charts in Tableau

In this year’s last post, I want to put a little spin on the bar charts and tree maps : )

While in a lot of cases pie charts aren’t great, they nicely convey the part-whole relationship, work reasonably well as long as the number of slices are small and used very commonly including by Tableau users. I recently came across two blog posts that took it a step further and built donut charts (1, 2) out of pie charts.  Both of these posts used overlay methods to hide the center of the pie chart which makes them easy to replicate (and that is what’s great about them) however they either have misalignment issues  or when you interact with the donut e.g. by clicking on it or hovering the z-order changes and your pie chart floats above the overlay so it is no longer a donut chart.

I wanted to do something that doesn’t have any of these issues and while doing that also enable stacking in various ways so it can also be used to generate radial bar charts and radial tree maps (also known as sunburst charts).

I also wanted to add some customizability such as the thickness of the bands and spacing between them. Below is resulting radial tree map. You can click HERE or on the image to play with the interactive version published on Tableau Public and download the Tableau workbook to take a closer look at how it works and apply to your own data if you like.

Radial tree map in Tableau (click to view in Tableau Public)

I created some imaginary sales data broken into categories: Furniture, Electronics and Food and then further into sub-categories e.g. Security Systems, Home Theater, Baked Goods etc. In the image above, the 3 major categories are represented by the inner-most ring. Each outer ring represents an immediate subcategory of the adjacent inner ring hence creating a hierarchical tree-view when read from the center outward.

The chart is created using the polygon mark type. Each point that make up the polygon are computed on the fly using 9 table calculations that are nested inside each other.

Input data looks like the following :

Input data for Radial TreeMap

This table could be further simplified by counting the occurrences of > in Product Category column using a calculated field instead of having a Level field but I chose not to in case I wire this up to a live SQL query at some point and Category name contained a > that’s not properly escaped.

I used the same trick I showed in my Coxcomb chart and pictogram posts to generate extra rows that leverages custom SQL, binning and densification to hold the calculated coordinates since each arc is approximated using ~200 points to get a smooth curve while there is only 1 row in the input data for each category. Please take a look at the previous posts if you’re curious how this works. The rest it trigonometry.

If you put this inside a dashboard and add an action you can also display the sum of values for selected slices at the center of this chart. As I said, since the chart is parameterized, you can change the spacing etc. between layers which is what I did below to get a slightly different look.

Radial TreeMap inside a Tableau dashboard (click to view in Tableau Public)

I am not going to get into the details of how exactly each calculation works. Best way to recreate these charts with your data is to convert your data into the format used by the workbook (as shown in the Excel spreadsheet above) and replace the data source. This way, you won’t have to worry about calculation settings and it will just work.

Now to get a donut chart, all you need is data that doesn’t have multiple levels and same calculations will give you this result.

Donut chart in Tableau (click on the image to view in Tableau Public)

With slight modifications to 3 of the 9 calculations, it is also possible to generate radial bar charts using the same template. In the example below I also changed the sort order such that I get bars in ascending order from the center outward. The outer-most ring represents all of Furniture sales while inner rings represent sub-categories of furniture.

Radial bar chart in Tableau (click to view in Tableau Public)

And there you have it. One template to build 3 common radial charts in Tableau. I hope you find this useful.

Happy new year!

Impressions from IEEE VIS 2014

Last week I was at IEEE VIS conference in Paris.  VIS consists of 3 conferences running in parallel : InfoVis (Information Visualization), VAST (Visual Analytics Science and Technology) and SciVis (Scientific Visualization). 

Overall the sessions can be broken down into 3 broad categories, new tools/visualizations, cognitive research on understanding of visualizations and potential improvements for exploratory data analysis, and applications.
 
I think the highlight of the conference was data mining. A large chunk of papers used various classification methods to create meaningful abstractions/generalizations over visualizations to improve their readability and increasing time to insight. It was also interesting to see how widely Amazon Mechanical Turk was being used by researchers.
 
Below are some of the talks/papers I found interesting.

Visualization tools and approaches

Inset : Visualization of intersecting sets
Probably one of the highlights for everyone who attended the conference
https://www.youtube.com/watch?v=-IfF2wGw7Qk
 
Onset : A visualization technique for large-scale binary set data
Probably another highlight for everyone who attended the conference
http://vimeo.com/102606152
 
DimpVis: Exploring Time-varying Information Visualizations by Direct Manipulation
https://www.youtube.com/watch?v=DrLaYHzS9k8
Like pages shelf in Tableau but via interaction with marks. Works well with scatter plot but not so much with other viz types.
 
Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets
Linked axis charts, Sankey diagrams… to highlight subset relationships between different data sources
https://www.youtube.com/watch?v=bm59Y8QYbAQ
 
Stenomaps: A new visual encoding for thematic maps
The stenomap comprises a series of smoothly curving linear glyphs that each represent both the boundary and the area of a polygon.
 
Visualizing Statistical Mix Effects and Simpson’s Paradox
Simpson’s paradox can cause standard charts to give misleading impressions about nature of changes. This paper introduces the comet chart which shows magnitude of change (e.g. unemployment) in one axis and change in group size in the other (e.g. highest degree earned) with a mark each for two points in time being compared connected via a tapered line that resembles a comet.
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/42901.pdf
 
Visual Adjacency Lists for Dynamic Graphs
A matrix like layout alternative to graph visualization to highlight adjacency relationships
http://vimeo.com/103884936
 
IvisDesigner – Expressive Interactive Design of Information Visualizations
https://donghaoren.org/ivisdesigner/
 
Revisiting Bertin matrices: New Interactions for Crafting Tabular Visualizations
Web-based version of Bertin’s reorderable matrix, following the black-and-white design of the original. Better to play with it than describe.
http://www.bertifier.com/
 
Progressive Visual Analytics (incremental results from long running tasks)
This was an interesting topic. There were two papers. The first one focused on cataloging existing algorithms in various languages e.g. R, Python that can already be used in this fashion by the visual analytics tool to make repeated calls to the algorithm in addition to providing guidelines for feature algorithm development to support updates during progress e.g. at the end of each iteration as it converges so user can terminate the long running task, change its parameters.
 
Opening the Black Box: Strategies for Increased User Involvement in Existing Algorithm Implementations
 
Second one showed a working system that does this. They rewrote their algorithm (Sequential PAttern Mining) converting from depth first to breadth first search and allowed users to see in progress results and steer the algorithm by pruning patterns that are not interesting.
 
Progressive Visual Analytics: User-Driven Visual Exploration  of In-Progress Analytics
 

High Dimensional Data

Unsurprisingly this group involved heavy reliance on glyphs
 
INFUSE: Interactive Feature Selection for Predictive Modelling of High Dimensional Data
Great but with 500 glyphs in front of you at once as presented in the conference, not very practical since you can hardly recognize the shapes.
https://www.youtube.com/watch?v=3x4P5vAGaZI#t=117
 
The Influence of Contour on Similarity Perception of Star Glyphs
They tested the effect of outlines in star glyphs, and found that the glyph works better without it, just showing the spokes which is surprising. But similarity is an ambiguous term. Researchers were looking for data similarity using a visual encoding that’s good for shape similarity and I doubt that subjects understood what they were supposed to evaluate. More on this later in the blog post…
 

Networks

Multivariate Network Exploration and Presentation: From Detail to Overview via Selections and  Aggregations
http://www.computer.org/csdl/trans/tg/preprint/06875972.pdf
System shows multivariate graphs and allows the concurrent display of the network and the multivariate data in the nodes. They allow the user to make selections to aggregate the graph to see the higher-level structure.
 
GraphDiaries: Animated Transitions and Temporal Navigation for Dynamic Networks
Imagine pages shelf  in Tableau but for networks
http://vimeo.com/102526646
 
How to Display Group Information on Node-Link Diagrams: an Evaluation
Different visual groupings to improve network readability
http://vimeo.com/102526993

Perception & Exploratory Analysis

Perceptual Kernels for Visualization Design
Create similarity matrices for different mark shapes, colors and their combinations based on user study and use this information to automatically pick mark types/colors for a viz to show similar items using similar marks and distinct items with visually distinctive marks.
http://hci.stanford.edu%2Fpublications%2F2014%2Fperceptualkernels%2Fperceptualkernels-infovis2014.pdf
 
Ranking Visualization of Correlation Using Weber’s Law
Weber’s law states that the just-noticeable difference between two stimuli is proportional to the magnitude of the stimuli. Authors noticed the model also fits well when just-noticeable difference is existence of correlation between two variables. And they used it to evaluate different chart types from scatter plot to parallel coordinates to donut chart and how good they are to visualize correlation in different scenarios e.g. to find out parallel coordinates are great as long as there is negative correlation etc.
http://www.cs.tufts.edu/~remco/publications/2014/InfoVis2014-JND.pdf
 
Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error
Interesting user study on how people perceive error bars, and alternative encodings to reduce this confusion.  Talk focused on error bars on bar charts (which I agree can be somewhat misleading to some people) but then generalized the argument to other charts. Alternative encodings proposed were 1) a violin chart variant that shows normal distribution on either side with lines covering full vertical extent of the chart 2) using transparency. Both approaches have their drawbacks, especially with transparency it is not possible to tell how far the bars extend. If you don’t have 20-20 vision, you get very narrow error bars :)  It was funny to see all the other presenters in the same session following this talk using error bars when sharing the results of their studies.
https://graphics.cs.wisc.edu/Papers/2014/CG14/Preprint.pdf
 
A Principled Way of Assessing Visualization Literacy
As I said earlier, Amazon Mechanical Turk was probably used for testing every research study presented. It was interesting to hear the last talk of the day about perception and design asked the question “how do you know the people you’re testing with are suitable for these tasks?” This paper tries to come up with a standardized test to assess visualization literacy.
https://hal.inria.fr/file/index/docid/1027582/filename/vLiteracy.pdf

Industry Presentations

To many probably the most interesting thing was something not even related to information visualization. AutoDesk was one of the main sponsors and among many other things they talked about the fact that they were able to design a virus, 3D print using a DNA printer, and boot it up. Oh.. and that they grew Van Gogh’s missing ear in the lab :), how they are printing self-assembling objects.

If you stumbled upon this blog by accident and have no interest in data visualization but like travelling, I have something for you, too. If you fly IcelandAir through Rejkjavik,  try to take advantage of their free stopover offer. You can spend up to a week at no extra charge and Blue Lagoon is about 15 miles from the airport.

Using Custom Shapes and Polygons to draw anything from Directed Graphs to Infographics in Tableau

Custom shapes are great for simplifying the association between marks and what they represent so the viewers can enjoy the visualization without having to constantly check the legend to remind themselves what they’re looking at.  But you can see them being used for various purposes in the wild such as navigation/filter controls in dashboards. Custom polygons provide another great way to draw custom shapes in Tableau although they are almost exclusively used in maps by most.

In this blog post, I will be sharing some rather unusual example use cases for both of these mark types about which I got questions about from many Tableau users via the blog or in person conversations at this year’s Tableau conference.

Before we go on any further, please keep in mind that large and/or asymmetrical shapes can negatively affect the accurate readability of your visualizations since their center (which is where the value on the axis corresponds) may be difficult to pinpoint AND most of the data below (while they may look prettier this way) can be more accurately depicted with basic chart types like the good old bar chart. For most of these examples I relied on the Sample – Superstore dataset even though the titles of the visualizations may say otherwise.  All of these visualizations are published on Tableau public for you to download and explore. You can also find information as comments inside the calculated fields that I created.

Let’s start with using marks to create the customized bar chart you see below.

Bar chart filled with custom marks

In simple bar chart form this would look like the following:

Basic bar chart

Now, how do you make this imaginary Sales data look like a chart that shows # births annually? By making the bars look like little kids holding hands, of course. : )

The core piece is the calculation that generates the evenly spaced marks. In this case I divided SUM(Sales) for each region in Superstore dataset into chunks of 5000 until we span the total amount.

Creating evenly spaced marks

This gets you most of the way and you can actually stop at this point if you don’t care about any more precision than increments of 5000. But what happens if you want to be a bit more precise? By using another calculation to show different marks for fractions of 5000…

Working with fractional icons

In this case, my dataset contained enough rows to accommodate all the marks I needed for my visualization. But what if it didn’t? The next example answers this question, as well as showing a different chart type.

Stacked icons are commonly used when author of the visualization isn’t too worried about exactness conveyed by the chart. For example our next pictogram shows tanks in Bacteria and Osterlich armies. I can clearly see Bacteria has the superior army. The difference is large enough that even a coarse visualization like this gets the point across. If I cared to count the tanks in the viz and looked up the population, I could have even found out that it corresponds to 0.46 tanks per capita! : ) 

Napaloni, de grosse peanut, de cheesy ravioli!

In this case since my dataset had fewer rows than the marks shown on the viz, I had to generate rows. To achieve this I used one of the densification tricks I showed earlier in my Coxcomb chart blog post.  You can see the custom SQL query below which adds a new row for each existing row with the value for Sales set to 0.

Custom SQL prep for binning

After doing this, I can right click on my Sales column and select “Create Bins”. I need to set the bin size to the quantity my each icon will represent. In the sample workbook, I created bins of 10,000 (icon says 10^6 tanks but underlying data is still aggregated Superstore). Now if I look at my data, this is what I see (0 rows are highlighted).

Underlying data

There are really two values in my dataset for each Market. But if I drag Sales (bin) into rows or columns shelf, now I can select “Show Missing Values” from pill context menu to tell Tableau also to show me the buckets that contain no data. Bins are normally used to generate histograms and this setting is intended to show/hide empty bins in the histogram. To show me the empty bins, Tableau creates new rows for me. This way my 2 rows for Bacteria become 27 rows which means now I can show 27 marks.

Same method can be used with multiple dimensions as well. The following example demonstrates this using Region and Product Type dimensions.

Stacked icons with multiple dimensions

In this case, the custom SQL query looks slightly different but rest of the process is the same.

Custom SQL prep for binning (two dimensions)

You can do a lot more with custom marks if you are adventurous and prefer prioritizing visually interesting charts at the expense of readability and accuracy. For example you can use a custom mark like a stencil as shown below.

Using stencil/cookie cutter images

In this visualization, marks are overlaid on top of a stacked bar chart. Custom mark (the beer glass) has a hollow center (transparent PNG) and is on the secondary axis where the bottom of the glass is fixed at 0 to achieve the effect.  White, yellow color pattern is used in stacked bar chart to resemble beer and foam. This chart is visually interesting but especially considering lack of axis and the fact that it is unclear whether the quantity is associated with the height or area, it is not easy to interpret.

You can take it even further and use images themselves to mimic bar charts as shown in the following example.

Keyser Söze

I hope that’s enough for the infographics. How about something more exciting, such as directed graphs? You could create a graph in Tableau using a dual axis chart combining lines and circles. But how do you indicate direction?

Directed graph in Tableau using custom marks

The first tab in the Tableau public visualization linked via the image above shows how to achieve this using custom marks. This is done by creating 360 custom marks to cover every 1 degree rotation of arrow and first vertex. A calculated field determines the angle for each edge which is used to decide which rotated arrow to display. Note that since the graph can be dynamic, the domain of the field on marks card (angles) may also change. This may lead to changes in what mark is used for what angle. To avoid this, if you look at the dataset associated with this example you will notice that the complete domain (of all possible angles) is added to provide padding but then “pseudo-filtered out” using Pages shelf so these rows don’t have any effect on the visualization itself.

An even more interesting way to do this to use the polygon mark type and custom render the arrows. This example uses the binning technique we discussed earlier to generate rows to accommodate each arrow (each arrow is made up of 8 points).

Directed graph using custom polygons

XFirstStep and YFirstStep fields draw the arrow and scale it based on the distance between points it will connect. Then rotatedX and rotatedY fields take the arrow and rotate it based on the angle between the two points. Since the arrows are rendered as polygons, they could be modified on the fly. I took it one step further and even parameterized the arrow so you can design your custom arrow! Just use the “Design your arrow” tab : )  Below is an example of different arrows you can create…

Custom arrows

Enjoy!

Tableau Conference 2014

It is time for Tableau Conference again! This year over 5000 customers will be joining us in Seattle at Washington State Convention Center. I will be attending the whole event and would be very interested in hearing from you about Tableau use cases and pain points. 

I will be co-hosting the Advanced Analytics track of Industry Analyst Q&A session Tuesday afternoon and giving a talk in Conference Room  310  between 12:30 – 1:30 PM the same day. If you’re interested in working with time series data in Tableau, we have a focus group on Wednesday between 12:15-1:15 PM you may want to attend. Also on Thursday between 7:30-8:30 AM, I will be co-hosting a focus group on data preparation.

Tableau Conference talk

We have a an exciting lineup, including talks from Neil deGrasse Tyson and Hans Rosling. It will be a great conference. Looking forward to seeing you there!