by Steve Wildstrom

In this series we’ve discussed software that takes advantage of GPU processing in the field of traditional “high-performance computing” or scientific computing domains, such as molecular dynamics, climate modeling, remote sensing and medical imaging. These applications tend to lend themselves naturally to parallel processing, and have a need for serious compute capability. Data mining, on the other hand, may not seem to be a natural fit for parallel processing. Yet at least one data mining software maker is scoring impressive performance gains using GPU processing for online business analytical processing (OLAP).

OLAP is a technique for taking a deep dive into a subset of what may be a very large database. Say, for example, you wanted to analyze sales by product, store location, and time of day. Data for those three variables are compacted into a “cube,” so called because each of the variables can be regarded as a dimension or axis and the entire data space can be visualized as a cube. In the real world, analyses typically involve more variables and the cube becomes an impossible to visualize hypercube of many dimensions.

The data compression is important because efficient processing requires that the data being analyzed be held completely in memory. Mattias Krämer, vice president for technology at OLAP software maker Jedox AG, says that multidimensional OLAP techniques can allow 20 GB of data to be compacted into a 2 GB cube, small enough to be stored in memory even on a relatively modest system.

Jedox, based in Freiburg, Germany, makes a set of tools called Palo Suite that, among other things, lets analysts run OLAP using familiar tools such as Microsoft Excel. The newest version of Palo, developed in cooperation with the University of Freiburg and the University of Western Australia, uses NVIDIA’s CUDA C to boost OLAP performance through GPU processing.

Credit: Jedox AG

The company explains the advantage of the GPU approach by analogy. Say you had to deliver newspapers to a large number of homes. You could use a truck and find the most efficient route to visit the houses one after another. Or you could use a fleet of bicycles, each delivering a paper to one house. The bicycles (GPU processor cores) are slower and less powerful than the truck (the CPU), but the combination of sheer numbers and the elimination of the need to find an optimal route makes this method more efficient.

Interestingly, Palo GPU is part of the component of the suite that runs on a server. Servers are typically equipped with the most minimal graphics systems. They often lack displays altogether since they are frequently administered remotely, and even when a local display is present, it is rarely used for anything more demanding than server administration or scanning logs.

But Palo GPU makes a case for equipping a server with one or more high-end GPUs exclusively to get the benefit of GPU parallel processing. In fact, a number of server makers just announced integrated CPU-GPU servers and blade systems using NVIDIA Tesla 20-series (Fermi) GPUs.

“Depending on the structure of the OLAP cube, the company has been seeing performance boosts of 40X to 100X, compared to CPU processing. As performance [has been] the key factor in business intelligence applications for years now, these are really good numbers,” Krämer says.

“The tremendous computing power of today’s GPUs is achieved by using an array of processor cores that outnumbers current CPU cores by almost two orders of magnitude,” Jedox says in a soon-to-be-published white paper. “The massive parallelism offered by a GPU has been used to solve many problems with speedups ranging from tens to hundreds compared to a single processor. The popularity of general-purpose computing on graphics processing units (GPGPU) has gained further momentum with the releases of programming interfaces such as NVIDIA’s CUDA C and OpenCL or ATI Stream SDK which allow programmers to develop algorithms for GPUs using common languages such as C with only minimal extensions. The CUDA framework in particular has led to a dramatic increase in applications implemented for GPUs. Apart from graphics applications, GPUs are nowadays utilized in many other areas of computing, such as physics simulations, protein folding, cryptanalysis, and many more.”

While calling CUDA “a very good development tool for our purposes,” there are some improvements Krämer would like to see. One is better support for the C++ programming language, the object-oriented extension of C. Another, more basic, change would be to make it easier for CUDA applications to run as “services” under Windows, an approach that makes more reliable and easier to administer. NVIDIA recently released a new Tesla Compute Cluster driver that addresses this issue for Tesla products.

This post is an entry in The World Isn’t Flat, It’s Parallel series running on nTersect, focused on the GPU’s importance and the future of parallel processing. Today, GPUs can operate faster and more cost-efficiently than CPUs in a range of increasingly important sectors, such as medicine, national security, natural resources and emergency services. For more information on GPUs and their applications, keep your eyes on The World Isn’t Flat, It’s Parallel.