NVIDIA today set out a vision for the next generation of computing that shifts the focus of the global information economy from servers to a new class of powerful, flexible data centers.
In a keynote delivered in nine simultaneously released episodes recorded from the kitchen of his California home, NVIDIA founder and CEO Jensen Huang discussed NVIDIA’s recent Mellanox acquisition, new products based on the company’s much-awaited NVIDIA Ampere GPU architecture and important new software technologies.
Original plans for the keynote to be delivered live at NVIDIA’s GPU Technology Conference in late March in San Jose were upended by the coronavirus pandemic.
Huang kicked off his keynote on a note of gratitude.
“I want to thank all of the brave men and women who are fighting on the front lines against COVID-19,” Huang said.
NVIDIA, Huang explained, is working with researchers and scientists to use GPUs and AI computing to treat, mitigate, contain and track the pandemic. Among those mentioned:
- Oxford Nanopore Technologies has sequenced the virus genome in just seven hours.
- Plotly is doing real-time infection rate tracing.
- Oak Ridge National Laboratory and the Scripps Research Institute have screened a billion potential drug combinations in a day.
- Structura Biotechnology, the University of Texas at Austin and the National Institutes of Health have reconstructed the 3D structure of the virus’s spike protein.
NVIDIA also announced updates to its NVIDIA Clara healthcare platform aimed at taking on COVID-19.
“Researchers and scientists applying NVIDIA accelerated computing to save lives is the perfect example of our company’s purpose — we build computers to solve problems normal computers cannot,” Huang said.
At the core of Huang’s talk was a vision for how data centers, the engine rooms of the modern global information economy, are changing, and how NVIDIA and Mellonox, acquired in a deal that closed last month, are together driving those changes.
“The data center is the new computing unit,” Huang said, adding that NVIDIA is accelerating performance gains from silicon, to the ways CPUs and GPUs connect, to the full software stack, and, ultimately, across entire data centers.
Systems Optimized for Data Center-Scale Computing
That starts with a new GPU architecture that’s optimized for this new kind of data center-scale computing, unifying AI training and inference, and making possible flexible, elastic acceleration.
NVIDIA A100, the first GPU based on the NVIDIA Ampere architecture, providing the greatest generational performance leap of NVIDIA’s eight generations of GPUs, is also built for data analytics, scientific computing and cloud graphics, and is in full production and shipping to customers worldwide, Huang announced.
Eighteen of the world’s leading service providers and systems builders are incorporating them, among them Alibaba Cloud, Amazon Web Services, Baidu Cloud, Cisco, Dell Technologies, Google Cloud, Hewlett Packard Enterprise, Microsoft Azure and Oracle.
The A100, and the NVIDIA Ampere architecture it’s built on, boost performance by up to 20x over its predecessors, Huang said. He detailed five key features of A100, including:
- More than 54 billion transistors, making it the world’s largest 7-nanometer processor.
- Third-generation Tensor Cores with TF32, a new math format that accelerates single-precision AI training out of the box. NVIDIA’s widely used Tensor Cores are now more flexible, faster and easier to use, Huang explained.
- Structural sparsity acceleration, a new efficiency technique harnessing the inherently sparse nature of AI math for higher performance.
- Multi-instance GPU, or MIG, allowing a single A100 to be partitioned into as many as seven independent GPUs, each with its own resources.
- Third-generation NVLink technology, doubling high-speed connectivity between GPUs, allowing A100 servers to act as one giant GPU.
The result of all this: 6x higher performance than NVIDIA’s previous generation Volta architecture for training and 7x higher performance for inference.
NVIDIA DGX A100 Packs 5 Petaflops of Performance
NVIDIA is also shipping a third generation of its NVIDIA DGX AI system based on NVIDIA A100 — the NVIDIA DGX A100 — the world’s first 5-petaflops server. And each DGX A100 can be divided into as many as 56 applications, all running independently.
This allows a single server to either “scale up” to race through computationally intensive tasks such as AI training, or “scale out,” for AI deployment, or inference, Huang said.
Among initial recipients of the system are the U.S. Department of Energy’s Argonne National Laboratory, which will use the cluster’s AI and computing power to better understand and fight COVID-19; the University of Florida; and the German Research Center for Artificial Intelligence.
A100 will also be available for cloud and partner server makers as HGX A100.
A data center powered by five DGX A100 systems for AI training and inference running on just 28 kilowatts of power costing $1 million can do the work of a typical data center with 50 DGX-1 systems for AI training and 600 CPU systems consuming 630 kilowatts and costing over $11 million, Huang explained.
“The more you buy, the more you save,” Huang said, in his common keynote refrain.
Need more? Huang also announced the next-generation DGX SuperPOD. Powered by 140 DGX A100 systems and Mellanox networking technology, it offers 700 petaflops of AI performance, Huang said, the equivalent of one of the 20 fastest computers in the world.
NVIDIA is expanding its own data center with four DGX SuperPODs, adding 2.8 exaflops of AI computing power — for a total of 4.6 exaflops of total capacity — to its SATURNV internal supercomputer, making it the world’s fastest AI supercomputer.
Huang also announced the NVIDIA EGX A100, bringing powerful real-time cloud-computing capabilities to the edge. Its NVIDIA Ampere architecture GPU offers third-generation Tensor Cores and new security features. Thanks to its NVIDIA Mellanox ConnectX-6 SmartNIC, it also includes secure, lightning-fast networking capabilities.
Software for the Most Important Applications in the World Today
Huang also announced NVIDIA GPUs will power major software applications for accelerating three critical usages: managing big data, creating recommender systems and building real-time, conversational AI.
These new tools arrive as the effectiveness of machine learning has driven companies to collect more and more data. “That positive feedback is causing us to experience an exponential growth in the amount of data that is collected,” Huang said.
To help organizations of all kinds keep up, Huang announced support for NVIDIA GPU acceleration on Spark 3.0, describing the big data analytics engine as “one of the most important applications in the world today.”
Built on RAPIDS, Spark 3.0 shatters performance benchmarks for extracting, transforming and loading data, Huang said. It’s already helped Adobe Intelligent Services achieve a 90 percent compute cost reduction.
Key cloud analytics platforms — including Amazon SageMaker, Azure Machine Learning, Databricks, Google Cloud AI and Google Cloud Dataproc — will all accelerate with NVIDIA, Huang announced.
“We’re now prepared for a future where the amount of data will continue to grow exponentially from tens or hundreds of petabytes to exascale and beyond,” Huang said.
Huang also unveiled NVIDIA Merlin, an end-to-end framework for building next-generation recommender systems, which are fast becoming the engine of a more personalized internet. Merlin slashes the time needed to create a recommender system from a 100-terabyte dataset to 20 minutes from four days, Huang said.
And he detailed NVIDIA Jarvis (since renamed NVIDIA Riva), a new end-to-end platform for creating real-time, multimodal conversational AI that can draw upon the capabilities unleashed by NVIDIA’s AI platform.
Huang highlighted its capabilities with a demo that showed him interacting with a friendly AI, Misty, that understood and responded to a sophisticated series of questions about the weather in real time.
Huang also dug into NVIDIA’s swift progress in real-time ray tracing since NVIDIA RTX was launched at SIGGRAPH in 2018, and he announced that NVIDIA Omniverse, which allows “different designers with different tools in different places doing different parts of the same design,” to work together simultaneously is now available for early access customers.
Autonomous vehicles are one of the greatest computing challenges of our time, Huang said, an area where NVIDIA continues to push forward with NVIDIA DRIVE.
NVIDIA DRIVE will use the new Orin SoC with an embedded NVIDIA Ampere GPU to achieve the energy efficiency and performance to offer a 5-watt ADAS system for the front windshield as well as scale up to a 2,000 TOPS, level-5 robotaxi system.
Now automakers have a single computing architecture and single software stack to build AI into every one of their vehicles.
“It’s now possible for a carmaker to develop an entire fleet of cars with one architecture, leveraging the software development across their whole fleet,” Huang said.
The NVIDIA DRIVE ecosystem now encompasses cars, trucks, tier one automotive suppliers, next-generation mobility services, startups, mapping services, and simulation.
And Huang announced NVIDIA is adding NVIDIA DRIVE RC for managing entire fleets of autonomous vehicles to its suite of NVIDIA DRIVE technologies.
BMW’s 30 factories around the globe build one vehicle every 56 seconds: that’s 40 different models, each with hundreds of different options, made from 30 million parts flowing in from nearly 2,000 suppliers around the world, Huang explained.
BMW joins a sprawling NVIDIA robotics global ecosystem that spans delivery services, retail, autonomous mobile robots, agriculture, services, logistics, manufacturing and healthcare.
In the future, factories will, effectively, be enormous robots. “All of the moving parts inside will be driven by artificial intelligence,” Huang said. “Every single mass-produced product in the future will be customized.”