NVIDIA Researchers Showcase Major Advances in Deep Learning at NIPS

by Kimberly Powell

AI has become part of the public consciousness. But to glimpse the minds of the people driving it forward, check out NIPS, arguably the world’s most prestigious neural network and machine learning conference.

Researchers and data scientists have been sharing their groundbreaking work — at what is officially known as the Conference and Workshop on Neural Information Processing Systems — for three decades. But it’s only with the recent explosion of interest in deep learning that NIPS has really taken off.

nips registration trends
Fill’er up: Advanced registration skyrocketed for the NIPS conference in 2017.

We had two papers accepted to the conference this year, and contributed to two others. The researchers involved are among the 120+ people on the NVIDIA Research team focused on pushing the boundaries of technology in machine learning, computer vision, self-driving cars, robotics, graphics, computer architecture, programming system, and other areas.

Although working across a variety of fields, they share common goals: advance the science of AI, develop new tools and technologies that will lead to more breakthroughs, and apply AI to modern day grand challenges like autonomous vehicles and healthcare.

One such advancement is work presented in the paper “Learning Affinity via Spatial Propagation Networks,” led by Sifei Liu, who was an intern at NVIDIA just last summer and now works here full time as a research scientist.

For a computer vision application to understand an image, it needs to identify and label what all its various pixels represent. For example, which pixels in an image belong to a bicycle’s tires or its frame, or the tree the bike is leaning against. This task is called image segmentation, and the spatial propagation network has a knack for doing it accurately and efficiently.

General architecture of a spatial propagation network
General architecture of a spatial propagation network.

The deep learning network uses the well-established physics principle of diffusion to better understand the relationship between neighboring pixels. This helps it differentiate, for example, between neighboring pixels of a bicycle’s wheel, its spokes and the empty space in between. This is a spatial affinity for image segmentation, but the network could be trained to determine many other affinities: color, tone, texture, etc.

The spatial propagation network learns to define and model these affinities purely using data, rather than hand-designed models. And the learning model can be applied to any task that requires pixel-level labels, including image matting (think Photoshop), image colorization and face parsing, to name a few. Plus the model could figure out affinities — such as functional or semantic relationships in an image — that might not even occur to people.

The paper includes theoretical underpinnings of the neural network’s operation along with mathematical proofs of its implementation. And it’s fast. Running on GPUs with the CUDA parallel programming model, the network is up to 100x faster than previously possible.

The spatial propagation network doesn’t require solving any linear equations or iterative inferencing. And it’s flexible enough to be inserted into any type of typical neural network, making it a potential umbrella technology for use in a variety of situations.

Shall I Compare Thee to a Summer’s Day?

A major trend at NIPS this year is the rise of unsupervised learning and generative modeling. A groundbreaking example of this is the paper “Unsupervised Image-to-Image Translation Networks,” led by NVIDIA researcher Ming-Yu Liu (no relation to Sifei).

To date, much of deep learning has used supervised learning to provide machines a human-like object recognition capability. For example, supervised learning can do a good job telling the difference between a Corgi and a German Shepherd, and labeled images of both breeds are readily available for training.

To give machines a more “imaginative” capability, such as imagining how a wintery scene would look like in the summer, Liu and team used unsupervised learning and generative modeling. An example of their work is shown below, where the winter and sunny scenes on the left are the inputs and the imagined corresponding summer and rainy scenes are on the right.

Unsupervised Image-to-Image Translation Networks

The NVIDIA Research team’s work uses a pair of generative adversarial networks (GANs) with a shared latent space assumption to obtain these stunning results. Considering the top two images above, the first GAN is trained on the winter scene — overcast skies, bare trees, snow covering just about everything but the cars sailing down the frozen road. The second GAN is trained to understand generally what summer looks like, but hasn’t been trained on the same specific scene as its counterpart.

And how could it? You’d need the same footage recorded from the same vantage point, with the same perspective and with all the oncoming traffic and other details in exact same location — for both summer and winter. The unsupervised learning developed by the team removes the need for this capture and labeling, which would otherwise take extensive time and manpower.

This unsupervised translation is enabled through a shared latent space assumption, which associates the GANs with each other by tying some of their parameters together. A summery translation of the wintery scene can be generated by transferring the representation from the first GAN to the second.

The use of GANs isn’t novel in unsupervised learning, but the NVIDIA research produced results — with shadows peeking through thick foliage under partly cloudy skies — far ahead of anything seen before.

The potential benefits of this technique are widespread. In addition to needing less labeled data and the associated time and effort to create and ingest it, deep learning experts can apply the technique across domains. For self-driving cars alone, training data could be captured once and then simulated across a variety of virtual conditions: sunny, cloudy, snowy, rainy, nighttime, etc.

images of cats from nvidia research at nips
The unsupervised image-to-image translation networks work can be applied to many domains including, naturally, cats.

For more on the other two papers at NIPS with contributions from NVIDIA researchers, see Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks and Universal Style Transfer via Feature Transforms.

And if you’re at NIPS in Long Beach this week, come check out the NVIDIA Research team’s work:

Learning Affinity via Spatial Propagation Networks

Poster session: Tuesday, Dec. 5, 6:30-10:30 p.m. in Pacific Ballroom 127

Unsupervised Image-to-Image Translation Networks

Spotlight session: Wednesday, Dec. 6, 11:25-11:30 a.m. in Hall C

Poster session: Wednesday, Dec. 6, 6:30-10:30 p.m. in Pacific Ballroom 120