Beating Black Swans: Toward Continuous, End-to-End, GPU-Enabled Computation and Visualization

by Bill Maimone

Whether it’s a surprise political event such as Brexit or extreme weather that affects a business decision, the ability to dynamically adapt to “Black Swan” events can make the difference between an average business and one that outperforms the competition.

Deep learning is a revolutionary development for businesses, but often a stepwise process. With data preparation frequently taking weeks, component analysis running for hours, and machine learning running for days or weeks, it hasn’t been ideally suited to adapt to unforeseen circumstances that move on a scale of minutes or seconds.

That’s changing. The combined effects of collaboration among software components and hardware evolving in tandem with analytics and deep learning promise to transform deep learning into a continuous and interactive experience.

In a recent blog post (see End-to-end Analytics on the GPU Data Frame), MapD CEO and founder Todd Mostak described the revolution taking place in the GPU software stack in the fields of analytics, machine learning and deep learning.

Enabled by NVIDIA’s hardware innovation providing 100x more processing cores and 20x greater memory bandwidth, companies such as MapD, Continuum, and others have rewritten key components to leverage the new technology landscape.

Todd also described the GPU Open Analytics Initiative, which was founded last month to promote seamless, in-GPU integration among any GPU-enabled components and the adoption of open source components such as Apache Arrow to promote free and open collaboration.

The initial GOAI prototype integration demonstrated at the GPU Technology Conference in May reduced a process that took half a day on CPUs to about 90 seconds on comparably priced NVIDIA GPU hardware.

This is remarkable as a proof-of-concept first result, but what happens next is more significant:

  • In this first demonstration, all processes executed on a single server, and some stages on only a single GPU. A GOAI goal for 2017 is to see all pieces enabled for multi-GPU, multi-node to accommodate end-to-end horizontally scaling to meet the needs of arbitrarily large, complex use cases. MapD’s addition of multi-node horizontal scaling with its 0 release is expected to be matched in all components this year, including the key data transfers among the parts.
  • The most computationally expense stage — the machine language processing — will get a lot faster. At GTC 2017, NVIDIA unveiled its Volta architecture, which promises a 12x improvement via the addition 640 Tensor Cores specifically designed for the matrix algebra used by machine learning.
  • GOAI has welcomed BlazingDB, Graphistry and UC Davis Gunrock as new members, and is formalizing a governance process for accepting a broad community of members and adopters.

The next stage of this evolution is to realize deep learning as an interactive, continuous process that more closely mimics the human thought process.

If your company does business in western Pennsylvania, you probably didn’t incorporate the possibility of extreme weather disrupting operations. But the recent rare tornado warning for the area can mean an uptick in business for hardware stores or decimation of tourism. Does hotel occupancy go up or down?

One way to quickly adapt would be to pull in economic data for similar regions struck by an equivalent disaster. To manage this sort of directed learning and data discovery, tools that support a wide range of visualizations, including geospatial maps layered with real-time analytics.

These can enable an analyst to quickly and interactively explore potentially relevant datasets, feed selections into a set of machine learning algorithms and then directly visualize the effects of the adjusted model to quickly understand the impact.

As a founding member of the GOAI to accelerate end-to-end analytics on the GPU, MapD will be demoing an example of the initiative’s first project at the O’Reilly AI Conference, booth 13, in New York, on June 28-29. Check out the GPU Data Frame demo on an U.S. mortgage dataset. GDF is a common API that enables efficient interchange of data between processes running on the GPU.

Also join our latest webinar, with NVIDIA’s Josh Patterson and myself speaking on the open source community on Wednesday, Sept. 6, at 10 a.m. PT.