Tableau 9.0 is loaded with features that will make it much easier to get your data ready for analysis. In my first post of “What’s New?” series for Tableau 9.0, I will start with Data Preparation.
Reading Loosely Structured Excel files
If you’re used to dealing with messy Excel spreadsheets, first thing you’ll notice is the “data interpreter”. Data interpreter is a nifty tool. If your spreadsheets are perfectly clean, you probably won’t even notice it is there. It will do a first pass through the data as it loads to populate the preview grid and run a light-weight version of the interpreter to see if spreadsheet contains anything that it can fix. If all looks good, it won’t make a peep.
If your data types and column lengths seem to not make sense or you seem to have multi-level headers as in the example below,
it will offer you help without getting in the way. Look for a gray bar at the bottom of the join diagram that spans its entire width.
Once you turn it on, it starts looking for areas of the sheet that can be converted into well-structured tables. Many properties of the sheet are used in this process from formatting elements such as color and borders to whether cells are empty or not and if not what their data types are. Based on this information, it will automatically discard the header/footer text and flatten multi-level table headers, remove inline comments… For the example above, you’d get something like this.
For comparison this is what it would look like with the option turned off. Big improvement!
Of course since it is trying to detect a pattern, there is a chance it will not get everything right even though it was trained on 6000+ Excel workbooks we gathered from public reference data repositories such as Data.gov and UK Office for National Statistics, OECD, WHO, UNESCO, World Bank etc. If any non-text values are dropped, you will promptly see a warning so you don’t miss anything you intended to keep. You can always preview the results in Excel (or other MS Office alternatives you might have on your machine that can open Excel files) by clicking on the preview button. Preview also includes markup indicating what cells were kept (and classified as header vs. data) and what was removed.
In 9.0 this functionality is only supported for Excel files opened through the new Excel connectors (not supported on legacy Jet connectors).
Working with crosstabs
Now that your data looks like a table, another common problem is that most reference data come as crosstabs since they’re intended to be consumed as reports as opposed to input for further analysis. With 9.0 fear not! Now you can pivot them.
If your data changes and after creating the workbook and you realize you have a new column with one more year of data, you can simply add it to the existing pivot.
To avoid potential confusion, it is worth mentioning that even though in the data analysis world this operation is often referred to as pivot, in database terminology it is called unpivot.
In 9.0 pivot is only supported when using the new Excel and Text connectors (not with Jet), calculated fields can not be pivoted and pivoted fields can not be used for joining tables (but they can be used when blending data sources).
Splitting delimited text
I am sure you had to deal with composite strings in the past such as e-mails, phone numbers, URLs. Extracting relevant parts of a string could be a hassle but it doesn’t have to be if you have Tableau 9.0. Smart split analyzes a sample of your data to determine if it can be broken into representative parts. Some well-known patterns such as e-mails and URLs are recognized with their types and each part is named appropriately. For example e-mail will be broken into username and domain.
If data can not be associated with a known type, Tableau will still break the string into its parts but each part gets a generic name after its token number. You can initiate this action in many different places in Tableau.
As you can see, this is not just a split operation. Tableau actually removes what it determines to be constants (the repeating characters) from the string hence when you apply smart Split on a string like RGB color codes in the form (33,66,99) you not only get 3 numbers but you also get rid of the parentheses.
If you have an irregular pattern and smart split doesn’t succeed you can use “Custom Split” to specify a delimiter, whether to split from left or right and how many times to split. Both split options automatically create calculated fields for you that rely heavily on a new function in the calculation language called SPLIT.
SPLIT function takes 3 arguments. Input string, delimiter and token number. For example if you have customer names in the form John. C. Doe. You can write the following calculation to get the first name
SPLIT([Customer Name], ‘ ‘, 1)
which gives you the first token from the left after splitting on space.
Assuming some people will have middle names and some don’t, if you’d like to get the last name you can write the following
SPLIT([Customer Name], ‘ ‘, -1)
which gives you the first token from the right after splitting on space.
Tableau Data Engine (extracts, new Excel and Text connectors, Salesforce, Google Analytics, OData, Azure DataMarket, SPSS, SAS and R data files), MySQL 4.0+, PostgreSQL 9.0+, Microsoft SQL Server 2008+ support splitting from both left and right. HP Vertica, Oracle, Teradata, Aster nCluster, Google BigQuery, Amazon Redshift, Hadoop Hive (Cloudera, Hortonworks, MapR) support only splitting from left. If a database has a built-in split function, Tableau takes advantage of it, if no split function is available but regex functions are, Tableau uses those. If neither are available, Tableau will rely on simple string manipulation functions in the database to achieve the same result. As the number of splits grow however SQL queries can get large and perform badly. To avoid this except Tableau data engine sources, number of splits are capped at 10.
Tableau 9.0 also expands on the Regex support. REGEXP_EXTRACT, REGEXP_EXTRACT_NTH, REGEXP_MATCH and REGEXP_REPLACE are now supported on PostgreSQL, Oracle (10g+), Hadoop HIVE & Impala (no _MATCH), Google Big Query and Tableau Data Engine. Data Engine supports RegEx expressions in conformance with Unicode Technical Standard #18 , Unicode Regular Expressions level 1, and in addition include Default Word boundaries and Name Properties from level 2. Since new Excel/Text connectors rely on extracts behind the scenes they get it for free. Same applies to other extract based sources such as Salesforce, Google Analytics, OData, Azure DataMarket, SPSS, SAS and R data files. And of course any database you explicitly create an extract from will have it.
There are many other improvements in the data preparation experience in Tableau 9.0 such as the metadata grid, which allows you to quickly review your table metadata, the newly designed connection experience, which allows searching the ever growing list of connectivity options and contains the list of recently used data sources.
Check-out the Tableau blog for a high-level overview of the new Data Preparation features in 9.0 and improvements in other feature areas.