An intriguing theme at NVIDIA’s GPU Technology Conference this week is the GPU’s impact on pushing the frontiers of neural networking closer to how the human brain recognizes images. It’s a research area that’s moving at the speed of light.
Just two years ago, a team at Google rocked the world of neural networking by training a 1 billion-parameter, 9-layer network to recognize images from 10 million YouTube video frames.
To make it happen, Google ran the training on a 1,000-server cluster running 16,000 CPU cores, by far the largest neural network of its kind at the time. While the results excited researchers worldwide, they also triggered a thirst for more, according to Adam Coates, a post-doc researcher at Stanford University, speaking at the conference.
“How do we do this at a more practical scale?” Coates asked the audience rhetorically. “This was a problem for those of us in academia.”
The problem was multi-faceted. Not only does a cluster the size Google created cost millions. It also requires extensive engineering resources to manage the inevitable disc failures and network traffic bottlenecks. And money and staff are two things university research labs don’t have in abundance.
The answer, it turned out, was a combination of GPUs and off-the-shelf hardware. Coates’ team of four researchers bought some commodity servers, installed some GPUs, and added some message-passing interface code so that the GPUs could communicate each other without causing exhaustive networking or compute bottlenecks.
That description doesn’t do justice to the technical conundrums Coates and his team had to solve along the way, but suffice it to say, their approach worked on many levels.
“We’re really extracting all the throughput that’s available on the GPU,” Coates told a roomful of GTC attendees. “The payoff is pretty massive.”
How massive? How about duplicating the results of Google’s neural network training experiment with just three commodity servers and 12 GPUs. Just for good measure, Coates said they decided to push the effort by increasing their network to 16 machines running 64 GPUs. The network achieved 47 times the throughput that Google got while running 11.2 billion parameters.
“You can get these neural networks up to a pretty ridiculous scale,” said Coates. And you no longer need millions of dollars to make it happen; thanks to the lead Coates’ team provided, so-called “tera-scale deep learning” is now possible in a typical research lab.
The only disappointment Coates expressed was in the finding that increasing the number of parameters by a factor of 11 didn’t result in that much of an improvement in the network’s image recognition.
“We don’t have an algorithm that can train an 11.2 billion parameter network,” said Coates. “We’re in this funny place where hardware is no longer the bottleneck.”
Logic would dictate that providing more image data for the network to learn from would help, but Coates said doing so doesn’t improve the results.
“My instinct is that we’re missing a key insight to make these algorithms work,” he said.