Hadoop

If you’ve got Hadoop data, it is probably huge, unstructured, nested or all three. Hadoop’s distributed file system (HDFS) and the MapReduce algorithm support parallel processing across massive data. This lets you work with data that traditional databases find extremely difficult to process, including unstructured data and XML data.

Tableau connects to multiple flavors of Hadoop. Once connected, you can bring data in-memory and do fast ad-hoc visualization. See patterns and outliers in all that data that’s stored in your Hadoop cluster. You can’t get value from your data unless you can see what’s inside of it.

“Tableau’s solution for Hadoop is one of the most elegant solutions I've seen, and performant,” said Ravi Bandaru, Product Manager of Advanced Analytics & Data Visualization at Nokia. “This obviates any need for us to move huge log data into Relational store before analyzing it with Tableau.”

The bottom line is that your Hadoop reporting is faster, easier and more efficient with Tableau.

Fast Hadoop Analysis
The power of Hadoop without the latency

Hadoop’s most well-known drawback is its high latency. When you work with Hadoop and Tableau, you can connect live to your Hadoop cluster and then extract the data into Tableau’s fast in-memory data engine. In order to get the benefit of ad hoc visualization at interactive speeds, you need to be able to move fast.

Tableau lets you bring your data into its fast, in-memory analytical engine. With this approach you can query an extract of data without waiting for MapReduce queries to complete. Click to refresh the extract or schedule automatic refreshes.

Native Connection
Native connectors to Cloudera Impala and Cloudera Hadoop, DataStax Enterprise, Hortonworks, and MapR Hadoop Distribution for Hadoop reporting and analysis

Unlike other Hadoop analysis software, getting Hadoop data to work with Tableau is easy: just point at your cluster! You do need Hive installed on your Hadoop cluster, which is a common component that provides a SQL interface to Hadoop. There’s no special configuration you need to do for either Tableau or Hadoop.

Cloudera Impala, Cloudera Hadoop, DataStax Enterprise, Hortonworks, and MapR Hadoop distributions are simply another data source in Tableau. You can connect with no programming and drag & drop to visualize your data.

Here we have weather data from a set of XML objects, now stored in a Hadoop cluster. Tableau’s powerful visualization capabilities let you create maps, charts and dashboards easily.

XML Support for Hadoop Data
Work with a variety of data, including XML

An important application of Hadoop and Hive together is working with a variety of data, such as XML files. This often means that you need to unpack nested data, perform data transformations and process URLs. Tableau supports a number of new string functions when working with Hive and Hadoop, including URL processing, regular expressions, and hex/binary numeric operators.

This weather data was stored as a series of XML files that were loaded into Hadoop and unpacked on the fly by the Tableau custom SQL connection – this is true flexibility, and almost like on-the-fly ETL.

Here we’re using the “XPATH” function to create a City field so that we can represent this data in a more traditional, relational way. XML functions are exposed in the Tableau calculations window when you’re working with Hive/ Hadoop data so you don’t need to do custom programming to work with XML objects.