The Argument for Accelerated and Integrated AnalyticsSeptember 22, 2016
The rise of modern business intelligence (BI) has seen the emergence of a number of component parts designed to support the different analytical functions necessary to deliver what enterprises require.
Perhaps the most fundamental component of the BI movement is the traditional frontend or visualization application. Companies like Tableau, Qlik, Birst, Domo and Periscope provide these. There are dozens more — all with essentially equivalent capabilities: the ability to make spreadsheets look beautiful. Some of these companies have been tremendously successful, primarily differentiating themselves on the axis of usability.
Another, equally critical component of the BI equation is the database. Here, too, there are the usual suspects: Redshift, Impala, Vertica, Netezza and others. Some of these databases are fully featured, system-of-record worthy solutions, while others focus on a particular performance axis, streaming, for instance, and do it well.
Finally, there is the emergence, across BI and database players, more advanced analytics tools, driven by the explosion of interest in and development of machine learning, deep learning and artificial intelligence. This market has its stars, starting with the big boys — Google, Facebook, Amazon, Microsoft, IBM, Baidu, Tesla — as well as a host of highly credentialed startups, like Sentient, Ayasdi, Bonsai and H2O.ai.
A successful, fully functional BI system has each of these operating optimally. The problem is that none of these systems are operating optimally. The reason is the extraordinary growth in data.
These systems are laboring because they are all based on an antiquated, CPU-centric view of the world that is computationally incapable of querying, rendering or learning from data at the scale demanded by the petabyte economy.
There is a solution, however. One that the deep learning folks have already adopted: GPUs.
With GPUs, you get order of magnitude performance enhancements. There’s a reason why so many supercomputers on the Top500 list use NVIDIA GPUs. It’s because GPUs are far more adept at the mathematical tasks than traditional CPUs.
But, it’s more than just deep learning. Databases and visualization also benefit significantly from GPUs. A system based on GPUs can deliver the speed and scale required to handle these massive working sets and deliver the functionality required.
What You Need to Know About GPUs and Integrated Analytics
First, GPUs offer exceptionally high memory bandwidth — we’re talking terabytes per second across multiple GPUs. This is important since database queries are typically memory bandwidth or I/O bound. Because of the memory footprint, GPUs can scan more data in less time, resulting in faster results.
To put this in context, noted database authority Mark Litwintschik found that a single GPU server is 74x faster than a larger cluster of Redshift over 1.1 billion rows. Not 74% faster, 74 times faster. Against Postgres, that figure was 3,500x faster — milliseconds vs. tens of minutes.
That’s significant because working sets have grown commensurately with data. A few million row dataset used to be big, now it is tiny. The new normal starts at several hundred million and runs to the billions.
Second, GPUs are not simply about straight-line speed. Other systems can be optimized for specific tasks and queries. GPUs are also extraordinary at graphics. In fact, the native rendering pipeline of GPUs makes them the ideal engine for data visualization.
This manifests itself not only in better looking dashboards, but also in more responsive, faster dashboards. The reason is that if you can do the query on the same chip as the render, then you don’t have to move your data around. This might not be a problem when dealing with only a few million rows, but it’s a big problem when you cross a billion, let alone several billion.
Finally, GPUs deliver supercomputing-class computational throughput. GPUs dominate the machine learning and deep learning ranks because they excel at matrix multiplication. Again, the ability to co-locate the querying and machine learning on the same chip lets you enjoy exceptional efficiency in feeding the machine learning algorithm with the data needed for training and inference.
How to Put GPUs into Play in Your Organization
True performance will come from an integrated system, one that combines GPU hardware with a GPU-tuned database, a GPU-tuned frontend/visualization layer and a GPU-tuned machine learning layer.
Upgrading just one component, however, creates the weakest link problem. A GPU database feeding a CPU visualization frontend will be faster, but it won’t be as fast as a GPU database feeding a GPU visualization frontend.
Any potential combination creates the same challenge, introduces the same weak CPU link.
The optimal system benefits from GPU hardware and GPU-tuned software at every turn.
Speed, visualization, advanced analytics — they’re all GPU-oriented. To use hardware or software that is designed for legacy compute platforms is to choose to wait, to downsample, to overpay on scaleout — even in the elastic world in which we live.
Integrated systems exist. And they have a headstart on incorporating other key tasks or subtasks that benefit disproportionately from GPUs. The integrated GPU stack has major implications for BI, IT, data science and other areas of the enterprise. This is precisely why MapD thinks this is the Age of the GPU, and why we’re so pleased to be part of the revolution.