NVIDIA to Unleash Deep Learning in Hyperscale DatacentersMarch 27, 2018
Millions of servers powering the world’s hyperscale data centers are about to get a lot smarter.
NVIDIA CEO Jensen Huang Tuesday announced new technologies and partnerships that promise to slash the cost of delivering deep learning-powered services.
Speaking at the kickoff of the company’s ninth annual GPU Technology Conference, Huang described a “Cambrian Explosion” of technologies driven by GPU-powered deep learning that are bringing support for new capabilities that go far beyond accelerating images and video.
“In the future, starting with this generation, starting with today, we can now accelerate voice, speech, natural language understanding and recommender systems as well as images and video,” Huang, clad in his trademark leather jacket, told an audience of 8,500 technologists, business leaders, scientists, analysts and press gathered at the San Jose Convention Center.
Over the course of a two-and-a-half hour keynote, Huang also unveiled a series of advances to NVIDIA’s deep learning computing platform that deliver a 10x performance boost on deep learning workloads from just six months ago; launched GV 100, transforming workstations with 118.5 TFLOPS of deep learning performance; introduced DRIVE Constellation to run self-driving car systems for billions of simulated miles.
Power to the Pros
Huang’s keynote got off to a brisk start, with the launch of the new Quadro GV 100. Based on Volta, the world’s most advanced GPU architecture, Quadro GV100 packs 7.4 TFLOPS double-precision, 14.8 TFLOPS single-precision and 118.5 TFLOPS deep learning performance, and is equipped with 32GB of high-bandwidth memory capacity.
GV100 sports a new interconnect called NVLink 2 that extends the programming and memory model out of our GPU to a second one. They essentially function as one GPU. These two combined have 10,000 CUDA cores, 236 teraflops of Tensor Cores, all used to revolutionize modern computer graphics, with 64GB of memory.
Deep Learning’s Swift Rise
The announcements come as deep learning gathers momentum. In less than a decade, the computing power of GPUs has grown 20x — representing growth of 1.7x per year, far outstripping Moore’s law, Huang said.
“We are all in on deep learning, and this is the result,” Huang said.
Drawn to that growing power, in just five years the number of GPU developers has risen 10x to 820,000. Downloads of CUDA, our parallel computing platform, have risen 5x to 8 million.
“More data, more computing are compounding together into a double exponential for AI, that’s one of the reasons why it’s moving so fast” Huang said.
Bringing Deep Learning Inferencing to Millions of Servers
The next step: putting deep learning to work on a massive scale. To meet this challenge, technology will have to address seven challenges: programability, latency, accuracy, size, throughput, energy efficiency and rate of learning.
Together, they form the acronym PLASTER.
Meeting these challenges will require more than just sticking an ASIC or an FPGA in a datacenter, Huang said. “Hyperscale data centers are the most complicated computers ever made — how could it be simple?” Huang said.
To put even more innovation to work faster, Huang announced a new version of our TensorRT inference software, TensorRT 4. Used to deploy trained neural networks in hyperscale datacenters, TensorRT 4 offers INT8 and FP16 network execution, cutting datacenter costs up to 70 percent, Huang said.
The software delivers up to 190x faster deep learning inference than CPUs for common applications such as computer vision, neural machine translation, automatic speech recognition, speech synthesis and recommendation systems.
Support from Key Partners
TensorRT 4 and GPU-powered inferencing are drawing support from around the technology industry.
Huang punctuated his announcement of support for GPU acceleration for Kubernetes to facilitate enterprise inference deployment on multi-cloud GPU clusters with a stunning demo of a flower recognition system scaling up to unimaginable speeds.
“This is like magic, Kubernetes is orchestrating this datacenter — it can assign one GPU, or many GPUs on one server, or many GPUs on many servers,” Huang said. “It can also assign it across datacenters, so you can have some work done on a cloud and some in our datacenter, some on this cloud and some on that cloud.”
In addition, Microsoft, which recently announced AI support for Windows 10 applications, has partnered with NVIDIA to build GPU-accelerated tools to help developers incorporate more intelligent features in Windows applications.
NVIDIA engineers have also worked closely with Amazon, Facebook and Microsoft to ensure developers using ONNX frameworks such as Caffe 2, Chainer, CNTK, MXNet and Pytorch can now easily deploy to NVIDIA deep learning platforms.
“Our strategy at NVIDIA is to advance GPU computing, to advance GPU deep learning at the speed of light, irrespective of whatever kind of AI framework you use or the deep learning network you want to create,” Huang said.
Feeding the Need for Speed
At the same time, Huang announced that the GPU-driven systems where advanced new deep learning networks are trained are growing vastly more powerful.
“Clearly the adoption of GPU computing is growing and it’s growing at quite a fast rate,” Huang said. “The world needs larger computers because there is so much work to be done in reinventing energy, trying to understand the Earth’s core to predict future disasters, or understanding and simulating weather, or understanding how the HIV virus works.”
Key advancements to the NVIDIA platform — which has been adopted by every major cloud-services provider and server maker — include a 2x memory boost to NVIDIA Tesla V100, the most powerful datacenter GPU, and a revolutionary new GPU interconnect fabric called NVIDIA NVSwitch, which enables up to 16 Tesla V100 GPUs to simultaneously communicate at a record speed of 2.4 terabytes per second.
Harnessing these innovations, NVIDIA launched NVIDIA DGX-2, the first single server capable of delivering two petaflops of computational power. DGX-2 has the deep learning processing power of 300 servers occupying 15 racks of datacenter space, while being 60x smaller and 18x more power efficient.
It is, in effect, a single GPU. “The world wants a gigantic GPU, not a big one, a gigantic one, not a huge one, a gigantic one,” Huang said moments before unveiling the DGX-2.
Simulating Billions of Miles of Driving
We’re also bringing deep learning — and the powerful visualization capabilities of GPUs — to speed the development of a new generation of self-driving vehicles with DRIVE Constellation.
DRIVE Constellation pairs a server simulating a self-driving vehicle’s sensors — such as cameras, lidar and radar — with another server equipped with our DRIVE Pegasus AI car computer.
The result: autonomous vehicles can be driven for billions of miles — and tested in a vast number of situations — before they’re put on the road.
Fusing Man, Machine in VR
Finally, in a rousing finale to his keynote, Huang showed how a human driver can take control of an autonomous vehicle, remotely. The live demonstration of real-time, bi-directional communication between sensors in an autonomous vehicle and a VR environment hints at a future where intelligent machines can work seamlessly with humans.
“Teleportation — the future has arrived,” Huang declared.