by Steve Scott

Oak Ridge National Laboratory (ORNL) made news yesterday by announcing it will deploy a revolutionary new Cray XK6 supercomputer based on 18,000 NVIDIA GPUs. This is an important milestone on the path to Exascale computing.

Titan will have the potential to deliver over 20 petaflops of peak performance, making it more than twice as fast (and three times more energy efficient) as today’s fastest supercomputer, Japan’s K computer.

I’m delighted that ORNL, the world’s leading open science facility, has embraced GPUs for its ground-breaking research. But I’m not surprised. After having worked at Cray for 20 years, the last six as CTO, I have a pretty good understanding of supercomputing scientists’ needs, and I’ve become convinced that heterogeneous solutions with GPUs are the future of high performance computing. That’s why I joined NVIDIA.

Let me share a few thoughts on why supercomputing is important, where it is today, and where it’s going.

Today’s scientists and policy-makers are grappling with huge societal problems: energy dependence, climate change, disease and national security, to name just a few. High performance computing has a critical role to play in meeting these challenges, as well as advancing just about every area of science.

For several hundred years, the traditional methods of scientific advancement were theory and experiment, but simulation has emerged as the third pillar of science. In many areas of science and engineering, simulation can provide insights and understanding that simply cannot be gleaned from experiment.

For some problems, a little more computing leads to a little better result, shortening the time to insight. But there are other problems where additional computing power can lead to truly transformational results.

Buddy Bland, the project director at the Oak Ridge Leadership Computing Facility, says, “There are serious exascale-class problems that just cannot be solved in any reasonable amount of time with the computers that we have today.”

The Jaguar supercomputer will be upgraded with
18,000 NVIDIA GPUs to reach 20 petaflops

That might be hard for those outside of supercomputing to understand. The fastest U.S. supercomputer, Jaguar, can run at about 2 petaflops. That’s two million billion mathematical operations per second!  Exaflop systems will be 1,000 times faster than petaflop systems, delivering one billion billion calculations per second. So why do scientists need that?

The U.S. Department of Energy says there are a number of computing challenges that require exascale-class computing, including:

  • Combustion: Exascale systems will enable the development of new combustion engines that are 20-50 percent more fuel efficient. This has the potential to dramatically improve industrial competitiveness, while decreasing our dependence on foreign oil.
  • Aerospace: Exascale will enable complete, first-principals simulation of jet engine combustion, allowing us to solve the problem of hot fluid migration into the turbine, providing a major advancement in efficiency.
  • Biology: Exascale will enable a comprehensive simulation of an entire cell at the molecular, chemical, genetic and biological levels, accurately representing processes such as cell growth, metabolism, locomotion and sensing – leading to the potential to cure some of our most pernicious diseases.
  • Fusion: Exascale is necessary to accurately model future Fusion reactors, which offer the promise of abundant, safe, non-polluting energy.

These are just a few examples, but it’s clear that exascale computing will provide tremendous benefits to society, advancing scientific discovery, informing policy makers, and improving industrial and economic competitiveness.

There is, however, a big problem in getting to exascale computing: power. An exascale computer using today’s x86 technology would require two gigawatts of power, equivalent to the maximum output of the Hoover Dam! Our technology needs to be about 100 times more energy efficient in order to build practical exascale systems.

Unfortunately, the technology scaling that gave us several decades of exponential growth in computing speed at constant power has effectively ended. While Moore’s Law is alive and well, allowing us to double the number of transistors on a chip with every new generation of IC technology, we can no longer keep dropping the chip voltage with each reduction in transistor size. The result is that power has become the dominant constraint in processor design. If we ran all the transistors we could put on a chip at full speed, the chip would melt. So it’s now all about power efficiency.

That’s where GPUs come in. Unlike traditional CPUs, which are designed to make serial tasks run as quickly as possible, GPUs are designed to run many parallel tasks as power-efficiently as possible. The result is that a GPU takes several times less power than a CPU per operation.

The next-generation Kepler GPUs used in the Titan system will provide more than one teraflop of performance per chip. In heterogeneous computing, the GPU can perform the heavy lifting, executing the parallel work with very low power, and the CPU can then quickly execute the serial work that’s left over. This is the only hope of getting to exascale computing at reasonable cost and power.

It won’t be easy. Imagine telling the auto industry “you need to develop a car that goes 1,000 times faster and is 100 times more energy efficient.” That’s a very tall order. But I’m confident that heterogeneous solutions with GPUs are the right path to get us there.

I’m excited to join the NVIDIA team, and work with a super-talented team to help reshape the landscape of high performance computing. It’s also extremely rewarding to see the great things our customers do with our technology to make the world a better place.

The drive to exascale computing will be a fascinating journey. Please let me know your thoughts about supercomputing and the race to exascale here on the blog. And we’ll be sure to share the important milestones along the way.