Fast Analytics: The Next Step for the Database Success Story
By Jock D. MackinlayIntroduction
Databases are one of the central success stories of the computer revolution. We have reworked most of our key institutions (commercial, medical, educational, governmental, non-profit, and more) to use databases for operational tasks. As databases increase in size and speed, our institutions are also using databases for more “cognitive” tasks such as monitoring and planning. However, effective cognition requires new ways for people to work with their databases. I believe the next step in the success story of databases depends on “fast analytics,” software that takes advantage of the speed and size of modern databases to help people see and understand their data at the speed of their thoughts.
The Database Success Story
Let’s begin by summarizing the success story of databases:
1960s: The early days of databases involved various specialized designs that were targeted at niche applications. Each had a unique way to store data, and each had a unique user interface. This diversity limited the impact of database technology.
1970s: The most significant event in the database success story occurred in 1970 when Edgar Codd formalized the relational algebra. Organizing data into normalized tables allowed large amounts of data to be stored in databases. The relational algebra also supported the development of high-level query languages such as SQL, which created a uniform way to access databases. Finally, relational databases were able to support parallel, transactional queries that allowed databases to be used in operational situations.
1980s: The introduction of commercial relational databases in the 1980’s started the unprecedented transformation of our institutions that is continuing today. Data has been moving from paper records to electronic databases. Online Transaction Processing (OLTP) supports the capture, storage, and delivery of operational data in electronic form, which improves the efficiency of our institutions.
1990s: When institutions move their data online for operations, they realize that they can also used the data for high-level activities such as monitoring and planning. However, operational databases were not designed for these higher-level activities. In the 1990s, the success story of databases continued with development of data warehousing technologies that enabled analytic reporting. Dimensional modeling and star schemas allowed databases to hold the large amount of data and historical data that is useful during analysis. Multidimensional cubes and query languages such as MDX allowed people to look across their operational databases and ask questions that would have been almost impossible to answer before the data was in electronic form.
2000s: The success story continues into the new millennium with the development of technologies that support ad hoc queries, including data warehouse applications, column stores, and in-memory databases. For example, Netezza has built massively parallel appliances that make queries 10 or 100x faster. Vertica has rethought how databases are structured and realized that they are almost exactly opposite from where they should be to support analytic queries – and if you flip them around, suddenly things are much faster. However, the success of this kind of technology requires a user experience that lets people take advantage of ad hoc queries in their work.
Agility: Using Data to Think
People want to think with their data because data can make their thinking more effective. Data can identify correct and incorrect thinking. Data can reinvigorate cognition about familiar topics and start cognition about new topics. However, people can only think effectively with their data when the data matches the agility of their thoughts. In particular, databases fail to support effective cognition when thoughts are interrupted, slowed, or outsourced:
Thoughts Interrupted: Data Queries
Although SQL was central to the success story of databases, most people find it hard to write queries. Having to write a query interrupts the flow of thought with a challenging activity. In fact, many people avoid writing queries altogether and ask their IT department to write queries for them, waiting for the results to be delivered either in spreadsheets or data reports. Unfortunately, IT departments are often overwhelmed by the number of requests. Waiting for a report can interrupt thoughts for a long time.
Thoughts Slowed: Data Tables
Analytical tasks often involve comparative and summary questions. When people ask such questions of a table of numbers, their thoughts are slowed because mental math is hard for people. It can also be slow to use calculators or spreadsheets to answer such questions.
Thoughts Outsourced: Data Reports
Many institutions have had tremendous success delivering data reports to their employees. People skilled at writing data queries can make sure the data reports contain correct data. Furthermore, the data reports can contain graphical views of data that avoid the mental math that is required when doing analysis with data tables and spreadsheets. However, some of the thinking has been outsourced to the report authors. It can be difficult to anticipate the analytical questions that need to be answered by a data report. When a person looks at a data report, it can raise unanticipated questions that cannot be answered with a static report of data even if it includes graphical views. Human thought is agile. It can move quickly in unexpected directions. Agile thinking can be very valuable to an institution. What people need is new ways to work with their databases that support the agility of their thoughts.
Fast Analytics: Databases for Everyone
Databases now hold a tremendous amount of valuable information and they continue to grow larger and faster. Modern databases have the capacity and bandwidth to support a large number of users. At this point, the bottleneck is how to empower everyone to use this data to make our lives better. If databases are to continue to grow and succeed, we need compelling ways for people to work and think with their data.
One idea, which we are exploring at Tableau Software, is to marry fast databases with high-quality computer graphics to make it much easier for everyone to see and understand their data. The result is fast analytics. The database provides the speed and the computer graphics provides an agile user experience that lets people think with their data. We address the thought issues described in the previous section with well-designed graphical presentations of data that exploit the power of the human visual system. If you are interested in the power of the human visual system to understand data, I recommend Colin Ware’s excellent book. The following example can give you a feeling for the power of this approach. The task is to count the 9s in the grid:
This particular technique for presenting data effectively is well known in financial circles, where negative values are often shown in red:
Reading Colin’s book will teach you many other ways to present data effectively using graphical techniques. For example, we can transform the previous financial table into the following bar chart:
Presenting this data as a table of bars exploits the power of your visual system to compare bar lengths. The red bar representing the sum of sales for Caffe Mocha in the East can be quickly compared with the sales for other products in the East and the sales of Caffe Mocha in other regions. The key visual finding is that the sales looks reasonable even though the profit is negative.
However, there is no single view that works for all questions. Looking at this bar chart might raise a question about the correlation of sales and profit, which can be quickly answered if you switch to the following scatter plot:
Tableau Desktop has a simple drag-and-drop user interface that allows people to explore many different graphical views of their data to find one that answers their question. This interface gives people the freedom to change their questions during their exploration. Tableau is based on a formal specification language called VizQL, which compiles into database queries so that people do not have to interrupt the flow of their thoughts to write database queries. VizQL can be compiled into the commercial database technologies from companies like Microsoft and Oracle. The current release adds Netezza to the list of databases that are supported by Tableau. Fast analytics is also about delivering live, interactive views to people rather than static reports. People want views that are current and views that they can adjust to address their immediate requirements. The Tableau Server product delivers such views to your browser where you can share them with your colleagues and friends. Finally, the free Tableau Reader lets you explore the workbook that produced the examples shown above.
As database increase in size and power, it becomes increasingly important to give people effective access to the data. Otherwise, the investment in database technology will fall short of its potential to improve our institutions and our lives.