While history of graph theory goes back to Leonhard Euler’s paper on “Seven Bridges of Königsberg” published in 1736, until the rise social networks, analyzing graphs and use of graphs in visualization has been a niche area mostly used in academic circles.
Well..Now that you have access to all this data, how do you visualize it in Tableau?
In Tableau it is relatively easy to generate a network graph using dual axis charts with line and circle/shape mark types once you know the layout. But when you have a graph probably two important questions are :
How to do the layout
How to calculate relevant metrics from the graph (e.g. betweenness, closeness centrality, authority scores…)
Especially if your graph is changing over time which is very likely, you would like to do these on the fly. Using the R integration feature, this is a fairly easy task. Below is an animation showing the layout changes as new vertices get added to the graph done using the workbook you can find in this blog post.
In this example we will be using igraph package which offers a variety of layout options to choose from as well as a handful of functions to calculate some interesting metrics.
To use the mark type Line in Tableau, you need to provide a list of points that defines a path. In this case since we are connecting vertices using straight lines, we only need two points (from/to) shown with order 1 and 2. Since our data is in the structure User : Person A, Retweeted by : Person B, the names will alternate e.g. first point of the line is associated with Person A while the other end Person B. You can see this structure being built in Custom SQL.
Once the structure is in place the calculated field Graph Nodes contains the R code to retrieve X – Y coordinates to display in the chart as well as the metric shown in the tooltip (in this case I used betweenness centrality). Everything is retrieved in a single call from R then decomposed into three parts locally.
However R is expecting the data in a different format than it is in Tableau. Tableau has 2 rows for each edge (from/to). R library on the other hand is expecting the edge list as two columns so we filter out the extra rows as we’re passing them to R first in the section of the script shown below.
Also from R we are getting a list of X – Y coordinates for each vertex (which is the union of the two vectors we are passing to R) while each vertex is linked to multiple other vertices so appear as many more rows in Tableau. So first we merge the coordinates with the unique list of vertex names :
and then at the end do a join using these names in R to replicate the rows so Tableau gets back two rows to define each edge. The join happens in the following part of the script
c<-join(allusers, c, by = 'users');
I tied the layout algorithm choice to a Tableau parameter to make it easier to try different options. Even though the R script passes weights (number of times Tweeted), not all layout algorithms take this into account. If you want force directed layout and see how these weights/forces affect the results, Fruchterman-Reingold would be the option to use.
You can download the example workbook from HERE. Enjoy!