MapD, H2O.ai, NVIDIA to Unveil GPU Data Frame at Strata

by Justin Sears

MapD, H2O.ai and NVIDIA will be unveiling the GPU Data Frame (GDF) at the Strata Data Conference in New York next week. And, as Ricky Ricardo used to say, we’ve “got some splainin’ to do.”

The GDF speeds up data science workflows by allowing them to be carried out entirely on GPUs.

Back in May, we announced our plans to build this common API, which enables efficient interchange of data between processes running on the GPU. At Strata, we’ll being showing the GDF at work, with demos of real-world use cases like predicting mortgage delinquencies.

So, how’d we get here?

MapD makes data science easier with interactive visualization of datasets with hundreds of columns and billions of rows. Data scientists also explore data mechanically by building models for machine learning and AI.

One of the most tedious and time-consuming parts of building a machine learning model is feature engineering — “the process of using domain knowledge of the data to create features that make machine learning algorithms work.”

The data scientists we meet aren’t engineering four features on 1,000 row datasets. Instead, they’re often engineering 40 features for their predictive models over hundreds of millions or billions of data points.

Without the analytic acceleration made possible by MapD, that feature engineering sucks up hours or days of a data scientist’s limited time. Inevitably, the model’s first training iteration returns results that could be improved, and then it’s back to the feature engineering and another training attempt. The process continues in a vicious cycle.

While MapD specializes in shrinking that time required for iterative, human-driven feature engineering, our partners at H2O.ai are machine learning experts. Both companies want our respective portions of the end-to-end machine learning workflow to be faster with GPUs.

Our collaboration with Anaconda (formerly Continuum) on GDF provides just that.

It also happens to be the first project in a much larger open-source framework for collaboration known as the GPU Open Analytics Initiative (GOAI), founded by MapD, H2O.ai and Anaconda.

MapD GDF diagram
GDF: the first example of GOAI collaboration.

Our work together within the open-source GOAI framework is exciting, and only the beginning. We invite all technologists and data scientists interested in accelerating data science on GPUs to join us and contribute to the PyGDF github open-source repository.

Come See GDF at Strata

It’s challenging to explain with words how much faster we’ve made machine learning run on NVIDIA GPUs. So come see the GDF for yourself at Strata.

Here are some ways you can learn more at the show:

  • Visit MapD booth 839
  • Tuesday, Sept. 26, 7-10pm | AI CONNECT | Join NVIDIA, myself and other partners to learn about the latest advances in deep learning and GPU-accelerated analytics
  • Wednesday, Sept. 27, 11:30am-12pm | MapD CEO Todd Mostak’s theater session at the NVIDIA booth 831
  • Wednesday, Sept. 27, 1:15-1:55pm | Accelerate Your Analytics with a GPU Data Frame | Todd Mostak’s deep-dive presentation on GDF, sponsored by MapD
  • Thursday, Sept. 28, 11:20am-12pm | Changing the Landscape with Deep Learning and Accelerated Analytics | A panel including Todd Mostak and H2O’s CEO Sri Ambati, hosted by NVIDIA’s Jim McHugh
  • Thursday, Sept. 28, 12-12:30pm | My theater session with MapD data scientist Wamsi Viswanath, showing the mortgage delinquency demo at the NVIDIA booth 831

Recordings from Strata as well as a lot more news from the growing GOAI open-source collaboration will be available in the months to come.