NVAIL Partners Showcase Trailblazing Deep Learning Research at ICLR

by Anushree Saxena

The International Conference on Learning Representations isn’t a name that rolls off the tongue. But for researchers looking to stay on the cutting edge of deep learning, it’s the place to be.

Better known as ICLR, this year’s conference will bring together experts from the world’s top AI research labs to Vancouver from April 30-May 3. Three of our NVIDIA AI Labs (NVAIL) partners — the Swiss AI Lab (IDSIA), New York University and the University of Tokyo — are among those sharing their work.

IDSIA researchers aim to give robots the same kind of understanding of the physical world that comes naturally to people. A University of Tokyo team will discuss its innovative method for improved sound recognition. And researchers from NYU and the University of the Basque Country will explain how they’re improving machines’ ability to translate languages.

Our NVAIL program helps us keep these and other AI pioneers ahead of the curve with support for students, assistance from our researchers and engineers, and access to the industry’s most advanced GPU computing power.

What Goes Up Must Come Down

Humans innately understand the physical world. We can navigate rooms we’ve never visited. If a shoe drops, we know it’ll hit the floor. And we’re well aware we can’t walk through walls. Even infants possess some basic physical understanding.

Machines don’t have it so easy. Today, training a deep learning model to understand things like “what goes up, must come down” requires lots of data and human effort to label it, said Sjoerd van Steenkiste, a Ph.D. student at IDSIA.

He and a team of researchers from IDSIA and the University of California, Berkeley, are working to streamline that process by eliminating the need for massive data and human interaction.

In a paper for ICLR, the researchers describe how they trained a neural network without human input, a process known as unsupervised learning. Using our DGX-1 AI supercomputer, they trained a deep learning model to distinguish individual objects in a scene and predict the consequences of actions.

Eventually, this research could make it easier to train robots and other machines to interact with their environments, van Steenkiste said.

Sound Mix

Some things are just better mixed together. Peanut butter paired with chocolate is heavenly. Metals are stronger and harder when they’re combined. And planting two crops together can yield bigger harvests.

Yuji Tokozume is applying the same idea to deep learning. The doctoral student and two other University of Tokyo researchers are set on improving sound recognition by using what they call between-class sounds — two sounds mixed together — to train a deep learning model. The model, trained on our Tesla P100 GPU accelerators, identifies the two sounds and determines the ratio of the one sound to another.

In their ICLR paper, the researchers report that between-class learning not only delivered higher accuracy than existing techniques but also surpassed human performance on environmental recordings in a standard dataset known as ESC-50. The team has applied the same approach to improve AI image recognition performance.

Learn more by viewing a talk on between-class learning for sound recognition at our recent GPU Technology Conference in Silicon Valley.

Lost in Translation

For all AI has achieved in automatic language translation, it doesn’t do much for less common tongues like Basque, Oromo and Quechua. That’s because training a deep learning model typically requires large datasets — in this case, vast amounts of text that’s been manually translated into other languages.

Ample data for widely spoken languages like Chinese, English and Spanish makes it possible to directly translate Chinese to English or Spanish to Chinese. Researchers at NYU and the University of the Basque Country aim to bring that capability to languages with smaller numbers of speakers.

Currently languages like Basque — spoken by an estimated 700,000 people, mostly in a region that straddles Spain and France — must first be translated into English (or another major language) before they can be converted to anything else, according to Mikel Artetxe, a doctoral student at the University of the Basque Country.

The same holds true for languages such as Oromo, which is spoken by more than 30 million people in the Horn of Africa, or Quechua, which is spoken by as many as 11 million people in South America.

The research team used our TITAN Xp GPUs to train a neural network to perform these translations without any manually translated training data, relying on independent text of both languages instead. In their ICLR paper, researchers said that accuracy improved when they added a small amount of parallel data, although it was still far below that of a human translation.

“Our goal is to be able to translate more languages with better results,” said Artexe.