What’s new in Tableau 9.0? Part 1 – Data Preparation

Tableau 9.0 is loaded with features that will make it much easier to get your data ready for analysis. In my first post of “What’s New?” series for Tableau 9.0, I will start with Data Preparation.

Reading Loosely Structured Excel files

If you’re used to dealing with messy Excel spreadsheets, first thing you’ll notice is the “data interpreter”. Data interpreter is a nifty tool. If your spreadsheets are perfectly clean, you probably won’t even notice it is there. It will do a first pass through the data as it loads to populate the preview grid and run a light-weight version of the interpreter to see if spreadsheet contains anything that it can fix. If all looks good, it won’t make a peep.

If your data types and column lengths seem to not make sense or you seem to have multi-level headers as in the example below,

A loosely structured Excel spreadsheet

it will offer you help without getting in the way. Look for a gray bar at the bottom of the join diagram that spans its entire width.

Tableau's Excel structure analyzer

Once you turn it on, it starts looking for areas of the sheet that can be converted into well-structured tables.  Many properties of the sheet are used in this process from formatting elements such as color and borders to whether cells are empty or not and if not what their data types are. Based on this information, it will automatically discard the header/footer text and flatten multi-level table headers, remove inline comments… For the example above, you’d get something like this.

The structure table Tableau was able to extract from the Excel spreadsheet

For comparison this is what it would look like with the option turned off. Big improvement!

Results if Excel workbook were to be read as is into Tableau

Of course since it is trying to detect a pattern, there is a chance it will not get everything right even though it was trained on 6000+ Excel workbooks we gathered from public reference data repositories such as Data.gov and UK Office for National Statistics, OECD, WHO, UNESCO, World Bank etc. If any non-text values are dropped, you will promptly see a warning so you don’t miss anything you intended to keep. You can always preview the results in Excel (or other MS Office alternatives you might have on your machine that can open Excel files) by clicking on the preview button. Preview also includes markup indicating what cells were kept (and classified as header vs. data) and what was removed.

Lost data warning

In 9.0 this functionality is only supported for Excel files opened through the new Excel connectors (not supported on legacy Jet connectors).

Working with crosstabs

Now that your data looks like a table, another common problem is that most reference data come as crosstabs since they’re intended to be consumed as reports as opposed to input for further analysis. With 9.0 fear not! Now you can pivot them.

Unpivoting in Tableau

If your data changes and after creating the workbook and you realize you have a new column with one more year of data, you can simply add it to the existing pivot.

Adding new columns to an existing unpivot transform

To avoid potential confusion, it is worth mentioning that even though in the data analysis world this operation is often referred to as pivot, in database terminology it is called unpivot.

In 9.0 pivot is only supported when using the new Excel and Text connectors (not with Jet), calculated fields can not be pivoted and pivoted fields can not be used for joining tables (but they can be used when blending data sources).

Splitting delimited text

I am sure you had to deal with composite strings in the past such as e-mails, phone numbers, URLs. Extracting relevant parts of a string could be a hassle but it doesn’t have to be if you have Tableau 9.0. Smart split analyzes a sample of your data to determine if it can be broken into representative parts. Some well-known patterns such as e-mails and URLs are recognized with their types and each part is named appropriately. For example e-mail will be broken into username and domain.

Splitting E-mail addresses using the categorical field headers in the sheet view

If data can not be associated with a known type, Tableau will still break the string into its parts but each part gets a generic name after its token number. You can initiate this action in many different places in Tableau.

Splitting an unrecognized domain using the preview grid in the Data tab

As you can see, this is not just a split operation.  Tableau actually removes what it determines to be constants (the repeating characters) from the string hence when you apply smart Split on a string like RGB color codes in the form (33,66,99) you not only get 3 numbers but you also get rid of the parentheses.

If you have an irregular pattern and smart split doesn’t succeed you can use “Custom Split” to specify a delimiter, whether to split from left or right and how many times to split. Both split options automatically create calculated fields for you that rely heavily on a new function in the calculation language called SPLIT.

SPLIT function takes 3 arguments. Input string, delimiter and token number. For example if you have customer names in the form John. C. Doe. You can write the following calculation to get the first name

SPLIT([Customer Name], ‘ ‘, 1)

which gives you the first token from the left after splitting on space.

Assuming some people will have middle names and some don’t, if you’d like to get the last name you can write the following

SPLIT([Customer Name], ‘ ‘, -1)

which gives you the first token from the right after splitting on space.

Tableau Data Engine (extracts, new Excel and Text connectors, Salesforce, Google Analytics, OData, Azure DataMarket, SPSS, SAS and R data files), MySQL 4.0+, PostgreSQL 9.0+, Microsoft SQL Server 2008+ support splitting from both left and right. HP Vertica, Oracle, Teradata, Aster nCluster, Google BigQuery, Amazon Redshift, Hadoop Hive (Cloudera, Hortonworks, MapR) support only splitting from left. If a database has a built-in split function, Tableau takes advantage of it, if no split function is available but regex functions are, Tableau uses those. If neither are available, Tableau will rely on simple string manipulation functions in the database to achieve the same result. As the number of splits grow however SQL queries can get large and perform badly. To avoid this except Tableau data engine sources, number of splits are capped at 10.

Regular Expressions

Tableau 9.0 also expands on the Regex support. REGEXP_EXTRACT, REGEXP_EXTRACT_NTH,  REGEXP_MATCH and REGEXP_REPLACE are now supported on PostgreSQL, Oracle (10g+),  Hadoop HIVE & Impala (no _MATCH), Google Big Query and Tableau Data Engine. Data Engine supports RegEx expressions in conformance with  Unicode Technical Standard #18 , Unicode Regular Expressions level 1, and in addition include Default Word boundaries and Name Properties from level 2. Since new Excel/Text connectors rely on extracts behind the scenes they get it for free. Same applies to other extract based sources such as Salesforce, Google Analytics, OData, Azure DataMarket, SPSS, SAS and R data files. And of course any database you explicitly create an extract from will have it.

There are many other improvements in the data preparation experience in Tableau 9.0 such as the metadata grid, which allows you to quickly review your table metadata, the newly designed connection experience, which allows searching the ever growing list of connectivity options and contains the list of recently used data sources.

The new connection experience

Check-out the Tableau blog for a high-level overview of the new Data Preparation features in 9.0 and improvements in other feature areas.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s