Breakthrough after Breakthrough
In 2003 Tableau spun out of Stanford University with VizQL™, a technology that completely changes working with data by allowing simple drag and drop functions to create sophisticated visualizations. The fundamental innovation is a patented query language that translates your actions into a database query and then expresses the response graphically.
The next breakthrough was the ability to do ad-hoc analysis of millions of rows of data in seconds with Tableau's Data Engine. The Data Engine is a high-performing analytics database on your PC. It has the speed benefits of traditional in-memory solutions without the limitations that your data must fit in memory. And in Tableau's tradition of making powerful tools accessible to all, there’s no custom scripting needed to use the Data Engine.
Natively visual–and therefore faster.
At the heart of Tableau is a proprietary technology that makes interactive data visualization an integral part of understanding data. A traditional analysis tool forces you to analyze data in rows and columns, choose a subset of your data to present, organize that data into a table, then create a chart from that table. VizQL skips those steps and creates a visual representation of your data right away, giving you visual feedback as you analyze. As a result you get a much deeper understanding of your data and can work much faster than conventional methods–up to 100 times faster.
A new language for data means you can say more.
This fundamentally new architecture does for data interactions in visual form what SQL did for data interactions in text form. VizQL statements describe an infinite class of sophisticated multi-dimensional visualizations. With VizQL, people have a single analysis interface and database visualization tool to produce a broad range of graphical summaries.
Tableau can create a shockingly broad range of visualizations, from bar and line charts to maps and sophisticated linked views. This flexibility allows you to understand data in an entirely new way. It allows you to find insights that would be lost if you had to shoehorn your data into rigid charting templates.
Supports natural patterns of thought.
Thinking is naturally a pattern of questioning and answering, incrementally making progress and taking new information into account. It’s rare that you know exactly where you’re going when you begin an analysis. Yet that’s what traditional BI tools require.
There’s an alternative: VizQL allows you to explore your data visually and find the best representation of it. You learn as you go, add more data if needed, and ultimately get deeper insights. We call this the cycle of visual analysis. When you’ve gone through this cycle you can communicate a much better story about your data.
It doesn’t exist anywhere else in the world.
Because of VizQL, fast analytics and visualization are reality. People with little or no training can see and understand data faster than ever and in ways like never before. And that’s the biggest difference of all.
The Data Engine
Designed to overcome the limitations
The Data Engine is a break-through analytics database designed to overcome the limitations of existing databases and data silos and to truly support the process of visual analysis. It is designed to reflect the capabilities of the latest hardware and the complete memory hierarchy from disk to L1 cache.
Tableau’s Data Engine shifts the curve between big data and fast analysis.
The evolution of large data
Databases have evolved substantially over the last several years. Legacy databases are focused on disk-resident data and pre-computation. While that allowed for more computation power than before, it had the disadvantage of being slow requiring users to know what questions they would want to answer (their query workload) before building the database.
More recent databases have found performance benefits by just using the top-levels of the memory hierarchy and requiring all data to be memory resident. These “in-memory” solutions made computation much faster, but at the expense of limiting the data size to the size of the available memory.
Goals of the Tableau Data Engine
We designed the Data Engine to:
- Fully utilize current generation hardware to achieve instant query response on hundreds of millions of rows of data on commodity hardware such as a corporate laptop
- Support true ad hoc query by having predictable and consistent query performance for all queries and no requirement for known query workloads or precomputation of aggregates or summaries
- Integrate seamlessly with existing corporate data warehouses and infrastructure
- Not be limited by a requirement for an entire data set to be loaded into memory resident to achieve its performance goals
- Provide very quick load and connections to data sources.
The core Data Engine structure is a column-based representation using compression that supports execution of queries without decompression. Leveraging novel approaches from computer graphics, algorithms were carefully designed to allow full utilization of modern processors with near optimal usage of the L1 and L2 caches, minimal intermediate results, and break-through techniques for managing streaming of data from disk to avoid loss of throughput that enable us to avoid the common limitation of requiring data sets to be completely loaded into memory before analysis can be done resident.
Data Engine to live connection—and back
The Data Engine is designed to directly integrate with Tableau’s existing “live connection” technology, allowing users to toggle with a single click between a direct connection to the corporate database (issuing highly tuned platform-specific SQL queries) to querying an extract of that data loaded into the Data Engine (and back) with careful matching of calculation and collation semantics. This integration allows companies to do analysis on samples of data (GBs) then redirect that to a massively parallel warehouse such as Teradata to run the final analysis (or reports) on Petabytes of data.
True ad-hoc queries
The Data Engine was designed with a query language and query optimizer designed to support the queries typical of on-the-fly business analytics. When working with data at the speed of thought, it is common to need to run complex queries such as very large multi-dimensional filters or complex co-occurrence queries. Existing databases generally perform poorly on these types of queries, whereas the Data Engine processes them instantly.
Flexible data model
One of the key differences of the Data Engine compared to other in-memory solutions is we can operate on the data directly as its represented in the database on disc. So there's no required data modeling and no scripting that needs to be done to use the Data Engine.
One of the things that’s so powerful about the Data Engine is you can define, just as with any other relational database, new calculated columns or you might think of it as sort of ad hoc data modeling at anytime.
Instance load and connection time
The Data Engine is unique in that once your data is loaded into the data engine, it has very fast start-up time. We only need to read in that portion of the data which our queries actually touch. You might have a lot of data in the database that’s not relevant to a particular analysis, you are never going to wait for the Data Engine to read that data.