World’s Fastest Supercomputer Triples Its Performance Record

ORNL demonstrates Summit’s mixed-precision capabilities, which are vital in the new era of AI supercomputing.
by Ian Buck

The world’s fastest supercomputer just got almost three times faster.

Using HPL-AI, a new approach to benchmarking AI supercomputers, Oak Ridge National Laboratory’s Summit system has achieved unprecedented performance levels of 445 petaflops or nearly half an exaflops. That compares with the system’s official performance of 148 petaflops announced in the new TOP500 list of the world’s fastest supercomputers.

The High-Performance Linpack benchmark, or HPL, has long been a yardstick of performance for supercomputers and the basis for the biannual TOP500 ranking.

Since its introduction roughly three decades ago by high-performance computing luminary Jack Dongarra, the Linpack benchmark has stood the test of time, providing a consistent measurement of supercomputing muscle. The benchmark estimates the performance of a supercomputer to run HPC applications, like simulations, using double-precision math.

While HPL continues to be a trusted benchmark to measure the performance of TOP500 systems for HPC applications, modern supercomputers are now being used for AI applications, not just simulations. And most AI models use mixed-precision math — a fundamentally different technique that enables researchers to improve the computational efficiency and access the untapped performance potential in modern supercomputers.

To account for the AI techniques that represent the new era of supercomputing, a new approach to benchmarking based on the HPL standard — called HPL-AI — incorporates the mixed-precision calculations widely used to train AI models.

Our test implementing HPL-AI on the Summit supercomputer affirms the feasibility of HPL-AI measurements at scale to gauge mixed-precision computing performance and complement existing HPL benchmarks.

“Mixed-precision techniques have become increasingly important to improve the computing efficiency of supercomputers, both for traditional simulations with iterative refinement techniques as well as for AI applications,” Dongarra said. “Just as HPL allows benchmarking of double-precision capabilities, this new approach based on HPL allows benchmarking of mixed-precision capabilities of supercomputers at scale.”

The methodology behind HPL-AI is outlined in a paper published at SC 2018 by Azzam Haidar, Dongarra and his team.

Reaching New Performance Peak on Summit

In a test-run on Summit, the world’s fastest supercomputer, NVIDIA ran HPL-AI computations with a problem size of over 10 million equations in just 26 minutes — a 3x speedup compared to the 77 minutes it would take Summit to run the same problem size with the original HPL.

“Ever since the delivery and installation of our 200 petaflops Summit system — which included the mixed-precision Tensor Core capability powered by NVIDIA’s Volta GPU — it has been a goal of ours to not only use this unique aspect of the system to do AI but also to use it in our traditional HPC workloads,” said Jeff Nichols, associate laboratory director at ORNL. “Achieving a 445 petaflops mixed-precision result on HPL (equivalent to our 148.6 petaflops DP result) demonstrates that this system is capable of delivering up to 3x more performance on our traditional and AI workloads. This gives us a huge competitive edge in delivering science at an unprecedented scale.”

Summit is loaded with more than 27,000 NVIDIA V100 GPUs, each utilizing hundreds of Tensor Cores that support mixed-precision computing. Five out of six finalists of the 2018 Gordon Bell Prize used the GPU-accelerated Summit system to power their projects, which included both simulation and AI tasks.

Science Researchers Turn to Mixed-Precision Supercomputing, for Simulations and AI

Scientists spanning the research fields of chemistry, nuclear energy, and oil and gas are using NVIDIA GPU-powered computing resources for groundbreaking work that requires both AI and simulation.

  • Nuclear fusion: Nuclear fusion is effectively replicating the sun in a bottle. While it promises unlimited clean energy, nuclear fusion reactions involve working with temperatures above 10 million degrees Celsius. They’re also prone to disruptions — and tricky to sustain for more than a few seconds. Researchers at ORNL are simulating fusion reactions so that physicists can study the instabilities of plasma fusion, giving them a better understanding of what’s happening inside the reactor. The mixed-precision capabilities of Tensor Core GPUs speed up these simulations by 3.5x to advance the development of sustainable energy at leading facilities such as ITER.
  • Identifying new molecules: Whether it’s to develop a new chemical compound for industrial use or a new drug to treat a disease, scientists need to identify and synthesize new molecules with desirable chemical properties. Using NVIDIA V100 GPUs for training and inference, Dow Chemical Company researchers developed a neural network to identify new molecules for use in the chemical manufacturing and pharmaceutical industries.
  • Seismic fault interpretation: The oil and gas industry analyzes seismic images to detect fault lines, an essential step toward characterizing reservoirs and determining well placement. This process typically takes days to weeks for one iteration — but with an NVIDIA GPU, University of Texas researchers trained an AI model that can predict faults in mere milliseconds instead.

New Addition to Benchmark Ecosystem

This isn’t the first time a new approach to supercomputing benchmarking has been recommended. Until the Green500 list was launched in 2007, there was no consistent measure of efficiency across the industry.

Multiple benchmarking approaches provide different perspectives, contributing to a more holistic picture of a supercomputer’s capabilities.

Today, no benchmark measures the mixed-precision capabilities of the largest-scale supercomputing systems the way the original HPL does for double-precision capabilities. HPL-AI can fill this need, showing how a supercomputing system might handle mixed-precision workloads such as large-scale AI.