Tableau 8.1 and R

By Bora Beran October 7, 2013
In this 2nd in a series of posts about the upcoming 8.1 release, I get to introduce one of the most exciting new features in Tableau 8.1: R integration. R has been a very popular language among statisticians but in the past few years, it has become the language of choice for a much broader group of data enthusiasts.

There are many reasons for this:

  • The cost: While commercial distributions exist, open-source R is free.
  • The rich features: R has an estimated user community of 2 million, which includes thousands of contributors from different domains expanding the language’s capabilities through new libraries.
  • The quality: R libraries are enhanced by domain experts and field-tested by the large user base including other experts with real datasets in real analysis scenarios.
  • The learning resources: Thanks to the active user community, plenty of tutorials and sample code are readily available.

When we were working on building a bridge between Tableau and R, we wanted to enable three core scenarios and types of users. Specifically, we wanted to:

  • Give Tableau users access to a rich, ever-expanding collection of statistical analysis and data mining libraries to help them gain deeper insights from their data.
  • Bring Tableau’s fluid data exploration experience and broad connectivity options to R users.
  • Enable consumers of Tableau worksheets and dashboards take advantage of R, simply by interacting with the visualization or widgets without the need to have any knowledge of the language.

With these goals in mind, Tableau 8.1 offers four new functions in the calculated field list: SCRIPT_REAL, SCRIPT_INT, SCRIPT_STR and SCRIPT_BOOL for handling results of different data types from R. (To skip directly to information about setting up R integration, jump to the end of this post)

Analyze your data source of choice at the speed of thought

Some of the most commonly asked questions on R discussion forums are about basic data shaping and management such as connecting to databases or how to filter, group or aggregate data inside an R data frame. It is very easy to do these when you couple R with Tableau, allowing you to explore and try new things on a whim, moving fluidly from one view of your data to another. Tableau also has many connectors tailored specifically for each database and employs a number of optimizations that are not available if you were to connect to a database directly from R using a package like rodbc.

In the video above, all we have is a single calculated field for identifying outliers in our data:

IF SCRIPT_REAL("library(mvoutlier);sign2(cbind(.arg1))$wfinal01", AVG([Flow CFS])) == 0 THEN "Outlier" ELSE "OK" END

The rest is the same Tableau experience you know and love!

Meet your advanced analytics toolbox

R provides a wide variety of analysis techniques, including statistical tests, linear and nonlinear modeling, time-series analysis, classification and clustering.

I would like to start with an example that involves clustering since it is a common exploratory data analysis task used in many fields from marketing to social sciences and biology. Fisher’s Iris dataset is the classic example used in almost every data-mining textbook. It consists of petal and sepal dimensions of a mixed sample from 3 species of Irises.

To identify the natural clusters in this dataset we will be using a technique called “k-mean clustering." This could easily be done in a calculated field with a single line of R code.

SCRIPT_INT(‘kmeans(data.frame(.arg1,.arg2,.arg3,.arg4),3)$cluster;', SUM([Petal length]), SUM([Petal width]),SUM([Sepal length]),SUM([Sepal width]))

Here .argN corresponds to the measures/dimensions you are passing from Tableau to R. For example .arg3 passed the values of SUM([Sepal length)] to R. Note that since SCRIPT functions are table calculations, these are all passed as aggregates. If you have one or more aggregated measures or dimensions, you can also pass Tableau parameters to R.

Here you can find the Tableau workbook that contains a working example, as well as the actual species information for each data point to see how accurate our classification was.

As long as the SCRIPT function returns one result for each partition or the same number of records as in the partition, Tableau will have no problems aligning the results with your input data. However if you sent 10 rows to R but got 3 rows back, Tableau will need your help figuring out how those 3 rows need to be associated with the 10 input rows. You can find the answer to this question in the k-means clustering example workbook by looking at the ClusterCenters calculated field.

Another good use case for R and Tableau is to transform complex datasets into forms suitable for visual exploration. Take roll call data for example. The dataset contains votes (yay, nay, no vote) for each member of the 111th House, a total of 1602 columns and 445 rows. Because there will be a range of topics being voted on, for the analysis to yield meaningful results we cannot simply create one aggregate for state or representative; we need to do comparisons within each voting session. Given the number of columns and the fact that parties often don’t vote in blocks, we can’t do this visually by using a heat map or scatter plot. The example workbook here shows how to transform this data using R into something consumable in a scatter plot or a map (as shown below) using a method called multidimensional scaling.

Empower your audience

R is great, but programming in R is not for everyone. The learning curve is steep if you are not an experienced programmer. Even if you are, certain aspects of the language can take some getting used to. But as you saw in all our examples, once you create the calculated fields and set up your visualization, the result feels just like any other Tableau dashboard. So you can publish your workbook on Tableau Server and even viewers who have never heard of R or written a single line of code in their lives will be able to take advantage of R simply by interacting with your visualization.

Our last example shows how you can user Tableau parameters and R calculated fields to create a dashboard that allows your users to perform a simple "What-If?" analysis.

Setting Up Tableau Desktop with R

Getting started with R on Tableau is easy (and you'll need to set up the connection before you can use any of the attached examples in this post, along with the 8.1 beta). Tableau communicates with R using a package called Rserve. If you do not already have R, you will want to install R and Rserve on computer running Tableau Desktop. You can get R from from http://cran.us.r-project.org/. You can install and initiate Rserve from R command line by entering:

install.packages("Rserve"); library(Rserve); Rserve()

Alternatively, you can have Tableau run R scripts on a remote server running Rserve.

Next, you will want to configure your connection. You can do this using a new item in the Help menu: “Manage R connection…”.

R on Tableau Server

Configuring Rserve connection on Tableau Server is done through tabadmin. Once you provide the Rserve information, all the workbooks published on that particular Tableau Server will use that Rserve instance which could be the on the same machine as Tableau Server or on a separate machine. However for using Rserve in shared environments, we recommend installing RServe on Linux. This is because in Windows all communications with Rserve share a single session, meaning that users will be able to access R objects created by other users and can overwrite each other’s objects. On Linux, this is not an issue since each call to Rserve is treated independently.

I hope you enjoyed this quick start on Tableau and R. There are many other new features in 8.1. Stay tuned for more blog posts on what’s new and exciting.

Comments

Submitted by Ben Sullins (not verified) on

Beautiful!

Submitted by Ben P. on

THANKS!!!!!!!! This is great! I've been digging into R in a Nutshell and this helps close the gap!

Submitted by Eric (not verified) on

One of the things that makes Tableau Desktop so pleasant to use is the way it allows you to connect to anything without any hassles. No system administration, it just works. The R integration is not so pleasant. I downloaded R and ran the command:

install.packages("Rserve"); library(Rserve); Rserve()

Not surprisingly, it didn't work as expected. It asked for a CRAN mirror. What's that? Soon another window pops up and I selected the MD location near me. Nothing happened, it hung. Several tries later I figured it must be a proxy problem and after messing with that for a while I still had no luck. But I happened to have RStudio installed so I tried running the same install command from there. It worked! Except that the workbook still would not run because I hadn't installed the 'plyr' package. Oh yeah, how could I be so stupid. How could I not know that I needed the plyr package. But how do I do that? No instructions. So I took a guess and ran this in RStudio:

install.packages("plyr"); library(plyr); Rserve()

Then my workbook would work. But nothing is easy because it seems that to run Tableau with R I must run RStudio and run the install commands every time I run Tableau. Alternately, I could choose to never exit RStudio. But I have to reboot someday. And by then I will have forgotten most of this so I'll have to go through the faulty instructions all over again.

It would be much nicer if Tableau would install R at the same time it installs Tableau. And the R server should already be available if you need it.

Submitted by Jimmy (not verified) on

Great tutorial if you're somewhat familiar with R. It covers all you need to know about how to make your scripts work with Tableau.

Submitted by Harley Norrgren (not verified) on

My least favourite thing about tableau has been turned into my new favourite thing!
I can't wait to see what the community does with the new functionality.

Submitted by nvandamm (not verified) on

Next step: integration with matlab ?

Submitted by . T. on

Great article and example. This works really well, my mind is brimming with thoughts of the possibilities.

I will experiment with this a lot.

Trevor

Submitted by Vladimir M. on

Hi,

Great development!
Rserve process is persistent. Can Tableau access a same R object from different scripts/calculated fields, rather than repeating calculation? For example in your iris k-mean example , all the R calulated fileds do the same k-mean calulation:
set.seed(42);result <- kmeans(data.frame(.arg1,.arg2,.arg3,.arg4), 3);

Can this calculation be done once, and other fields just access "result" object?

Vlad

Submitted by Adam McCann (not verified) on

wow you're a whiner. this is beta. this literally hasn't come out yet and you're complaining it should be better. chill out and maybe give them a version or 2 to make it better.

Submitted by Adam McCann (not verified) on

my previous was comment to Eric's comment

Submitted by Bora B. on

Hi Vlad,
Yes, that's true, if you are running it on Windows. So technically you could do result <- kmeans(data.frame(.arg1,.arg2,.arg3,.arg4), 3); in one calculated field, and result$totss in another calculated field because result object persists in your session. In a shared environment though (assume published dashboard on Tableau server) sharing sessions would be a nightmare. For example somebody else can generate an object with the same name and overwrite yours. Our somebody else could be reading from your result object etc. That's why we recommend running RServe on Linux in shared environments. On Linux state is not maintained and each call to Rserve is treated separately. As a result if you relied on result$totss in your local workbook, the moment you use it against an Rserve instance on Linux (this could be the result of just publishing to Tableau server which is pointing to a Linux machine hosting Rserve), it will stop working. Because of all this while it is possible to rely on session state working locally on a Windows machine, we don't recommend it.

Submitted by Mick Arundell (not verified) on

If you have proxy problems with R use this to start it
"C:\Program Files\R\R-3.0.1\bin\i386\Rgui.exe" --internet2

I have that in a desktop shortcut.

Mick

Submitted by steve miller (not verified) on

i'm using an lm object stored in an R workspace to "score" data in Tableau. It works fine as long as all Tableau variables are numeric, but if my R model has "factors", the R script barfs. Any workarounds for this? My models always involve category or factor variables.

Thanks.

Submitted by Bora B. on

Here is an example. I used WorldPhones {datasets} as the input data.

SCRIPT_REAL('phones <- data.frame(cbind(Year=.arg1, NumPhones=.arg2, Continent=factor(.arg3))); fit <- lm(NumPhones ~ Year + Continent, data=phones);fit$coefficients[1]',ATTR([Year]),SUM([N Phones]),ATTR([Continent]))

In this case I am retrieving the intercept into Tableau but if I understand correctly your problem was about passing data from Tableau to R so that part does not really matter.

Submitted by Elisa A. on

I could open the kmeans workbook without any problems and everything worked beautifully.

However, when I tried to replicate the worksheet from scratch, I kept getting an error that says:

Error in sample.int(m, k) :
cannot take a sample larger than the population when 'replace = FALSE'

From the calculated field script, I do not see where is it specifying the sample size.
I get the exact same error trying to use kmeans in tableau on my own data set.

Can anyone please help me?

Submitted by Tom Scott (not verified) on

This normally happens when k is a large number compared to the number of data points you have. E.g. you are asking for 11 clusters but have only 10 rows in the data.

Submitted by Elisa A. on

Thank you for your reply. My friendly Tableau guy has told me to turn off aggregation under the Analytics menu. Which worked! So I thought I'd share this pointer in here as well.

Submitted by Charles T. on

Could you include the data for the first set?

Submitted by madhusudhan r. on

Could any one please tell the path for downloading the tableau 8.1 beta version.

Submitted by Brian R. on

Great examples!!! Can't wait to get stated!

Had some trouble downloading sample workbooks in Firefox.

Chrome and IE work fine.

Submitted by Brian R. on

Are you talking about the iris example?

The data is already packaged in R.

Go to your R command line and type:

data(iris)
write.csv(iris,"IrisData.csv")

This will output the data in csv format to your working directory. =)

Submitted by sid s. on

Hi,
I do want to know if it is possible to import .rds (R file) into Tableau??

Submitted by Bora B. on

You can use any valid R script from Tableau. So you can do readRDS(...). But depending on how dynamic the content of the RDS file will be, you may want to do this in your Rserve configuration file instead. Rserv.cfg has two options.
1) eval
2) source

You can use eval to evaluate an R script. E.g. if you add the following line to your Rserv.cfg

eval x<-readRDS("enter path and name to your RDS file here"); y = 10;

and restart Rserve

From Tableau you can refer to x in your SCRIPT functions and read the values from the object that was serialized into the RDS file. Since in this example we are also setting y to 10 if you did SCRIPT_REAL('y+.arg1', SUM([Sales]) in Tableau, you will see that 10 is added to your SUM of Sales.

Using source configuration option you can execute R scripts in R source files in Rserve and have access to the objects created within. For example

source C:/MyFolder/sample.R

In this case if sample.R was restoring some R objects using readRDS, you would have access to all of them inside Tableau.

I hope this helps.

Submitted by Edwin (not verified) on

I have tried a couple of different things (regression, splines, PCA) on different example data sets, like the Iris data set. Simultaneously I ran the analysis in R to see if I obtained the same results. Although I managed to get everyting working the results in Tableau differed from the (correct) results in R. I made sure the data were exactly similar (all data sets were exported from R) and that the implemented methods were the same. Maybe this difference is due to European settings of my computer (where dots and commas are used the otherway around from the anglosaksian use, e.g. the American number 1,287.98 is written as 1.287,98 ). I hope you can help me with this.

Submitted by Bora B. on

Hi Edwin,
Workbook locale would not cause that. Sounds more like an addressing/partitioning issue. Tableau will aggregate based on what you set for addressing and aggregation method and evaluate each partition as a separate call to R. So it may be the same input data, but as a result of these settings, what Tableau is sending to R could be aggregated or broken into separate calls while you wanted them to be aggregated in a different way or not aggregated at all and sent as a single vector as opposed to partitioned. Is there any way you can share your workbook (maybe post it as a question on the forum)?

Submitted by Edwin T. on

Thanks Bora, I am glad that the workbook locale has no effect! I was aware that we have to send the vector and not the aggregate to R, thats why I always switch off aggregate measures. I will share my workbook on the forum so you can have a look.

Submitted by Alicia Linde (not verified) on

Any thought about integrating this with SAS?

Submitted by Brian N. on

Having problems with the install:
> install.packages("Rserve"); library(Rserve); Rserve()
Warning in install.packages("Rserve") :
'lib = "C:/Program Files/R/R-3.0.2/library"' is not writable
Error in install.packages("Rserve") : unable to install packages

Any more instructions I am missing?

Submitted by Charles T. on

Change the Directory under File > Change Dir

This should resolve your issue. Also it sounds like you are in some corporate setting in which case you are most likely behind some proxy. If you run into further issues try running the command setInternet2()

Submitted by Elias F. on

Thanks, Elisa! That was the essential pointer for me. In my data set, I had 81 rows and I was asking for 3 clusters. Ensuring that my measures were set as such on the rows/columns shelves AND having Aggregate Measures deselected in the Analysis menu did the trick.

Submitted by Nick Lang (not verified) on

I'm having an issue when running the Clusters workbook with an Rserve connection that is not my local machine. When I run the workbook and use my local host for my Rserve connection, the workbook runs as planned. However, when I attempt to connect to a separate server that I have Rserve installed on, I get the following error:

Error in data.frame(.arg1, .arg2, .arg3, .arg4) :
object '.arg2' not found

Unsure as to how to troubleshoot this issue, hoping for some advice. Thanks!

Submitted by Bora B. on

Hi Nick,
Can you try downloading the workbook again and running it one more time? I suspect this is happening because of how Linux handles sessions differently from Windows which is only briefly mentioned in the article. On your local machine (assuming Windows) your R session contains all necessary objects because they are created by other calculations in the workbook. But when you switch to the other machine (assuming Linux/Unix/Mac) each function call is treated independently so there are no prior objects. I updated the workbook to recreate those objects in each function call so it works in both environments. Please give the updated workbook a shot. If it is still not working, let us know.

Thank you,

~ Bora

Submitted by Nick Lang (not verified) on

Thanks Bora, worked as expected this time!

non-humans click here