Building a Super-Computer With a Power Drill and 18,688 GPUS

Al Enger has been a busy man. Enger is one of a crew of Cray engineers who have been working to assemble a massive new supercomputer at the Oak Ridge National Laboratory in Tennessee. It’s called Titan.

There are a lot of ways to measure Titan’s size. The machine is about as big as a basketball court. It contains 6,329 miles of interconnect cables. It’s cooled with 1,353 gallons of special refrigerant. Data is stored on 21,030 disks.

But the best way to understand Titan’s scale and complexity is to talk to one of the blue-coated engineers who scamper around its 200 towering black cabinets. Enger and his colleagues from supercomputer company Cray work under fluorescent lights as fans pump 1.3 million cubic feet of air per minute through the room. Ear plugs are recommended.

Mixing GPUs and CPUs makes Titan five times as efficient as its predecessor.
Mixing GPUs and CPUs makes Titan five times as efficient as its predecessor.

Two months ago pallets bearing the 18,688 NVIDIA Tesla GPUs that provide about 90% of the machine’s computing power began to arrive. That’s when Enger picked up his green and black power drill and got to work. It took Enger and 20 colleagues three weeks, working 7 days a week, to bolt all those GPUs into the machine.

The result could be the world’s most powerful computer. It won’t be official until November, when TOP500.Org releases its semi-annual list of 500 fastest supercomputers. But there can be no doubt Titan represents a breakthrough. At its peak, Titan cranks out more than 20 petaflops. That’s twenty thousand trillion floating point computations per second (‘floating point’ refers to a format many computers use to represent very small and very big numbers efficiently).

What’s really significant about Titan isn’t how many zeros you need to measure its performance, but how few megawatts Titan needs to do its work. Because it relies on GPUs to do much of the computing — rather than just CPUs — Titan requires only 9 megawatts of power.

Titan represents a step towards even faster 'exascale,' computing.
Titan represents a step towards even faster ‘exascale,’ computing.

Titan is five times as efficient as Jaguar, the 2.3-petaflop computer it replaced at Oak Ridge. That efficiency comes thanks to an idea called ‘heterogeneous computing,’ says Buddy Bland, project director for the Oak Ridge Leadership Computing Facility.

“If this were a machine of the same power and it were using CPUs it would be using about 30 megawatts of power, or about $30 million a year,” says Bland. “So heterogeneous computing really gives us a lot more bang for the buck.”

That’s because GPUs rely on the parallel computing technology long prized by supercomputer engineers. In order to render virtual battlefields or imaginary dragons for video game enthusiasts, GPUs hustle through a number of tasks at the same time, rather than bouncing quickly from one task to another, as CPUs do.

It turns out that’s a very efficient way to do computing, says Bronson Messer, acting group leader for scientific computing at the Oak Ridge Leadership Computing Facility.

“The kind of physical things that happen in a game, it turns out those things happen in nature as well,” says Messer, who admits to knowing his way around a game controller. “These are exactly the kinds of problems we’re trying to solve in a lot of scientific questions, from combustion to climate.”

Some assembly required: Titan contains more than 18,000 GPUs.
Some assembly required: Titan contains more than 18,000 GPUs.

The result is a sort of synergy between gaming and scientific research, with the tens of millions of consumers who rely on GPUs to power their games paying for research on a scale that the super-community could never afford on its own.

Yet the work done by those researchers is increasingly critical. Bland sees the simulations run by powerful machines such as Titan as playing an increasingly important role in scientific research. Titan is an open-science system, which means it can be used by researchers from academia, government labs, and private companies to model physical and biological systems ranging from the earth’s climate to the way engines burn fuel.

More powerful machines are coming. Titan – and its 18,688 GPUs — are a step forward on the path towards a concept Bland calls exascale computing. Titan can generate 20 thousand trillion flops. Exascale machines, by contrast, will generate one million trillion flops.

The U.S. Department of Energy would like to hit that mark by the end of the decade using just 20 megawatts of power. That’s a little more than twice what Titan consumes now.

Al Enger might want to start charging that power drill now.

Photos: Oak Ridge National Laboratory

Similar Stories

  • Sagar Rawal

    Simply amazing!

    Now give us gamers the power of GK110!

  • Alaa Wadi

    Great..

  • rritambhar

    o boy o boy o boy!!!!

  • Sagar Rawal

    This supercomputer will surely enable for much larger computationally intensive research projects to be completes…and the Tesla K20 is primarily responsible for allowing for such a large performance improvement over Jaguar while maintaining a manageable power footprint.

    Fantastic work NVIDIA!

  • Maurício Togawa

    Runs Crysis at full graphics??? lol

  • Sagar Rawal

    Brian, your writeup on Titan is fantastic!

    If any readers are interested in more details about Titan, including videos of the Fermi to Kepler swap-out done by Cray, the following article by AnandTech is stellar:

    http://www.anandtech.com/show/6421/inside-the-titan-supercomputer-299k-amd-x86-cores-and-186k-nvidia-gpu-cores

  • RunkaminStorakuk

    so… will it blend?

  • Brian_Caulfield

    Ha! We’re going to need to find an awfully big blender to find out. 

  • Brian_Caulfield

    😉

  • Brian_Caulfield

    I love Anandtech, too. Thanks for the link! 

  • libardo hurtado

    Y con Tecnologia BULLDOZER…. CPU.s OPTERON .. 😀

  • http://twitter.com/_flav3r_ Data Not Found

    these pics above are so awesome but they are uploaded in such a small resolution its sad. Please please upload larger res pics atleast 3/4 of the size of a normal laptop screen these days. Also please add these pics to the blog in a way that they should be able to be viewed by clicking Next button. Thanks.

  • http://twitter.com/_flav3r_ Data Not Found

    you bet … runs 18000 crysis games running at the same time in full detail at res at 5000*3000 res 😛

  • http://www.antamedia.com/hotel-wifi/ Hotel Wifi

    Wow, looks awesome. I’d love to work here

  • http://www.facebook.com/profile.php?id=100001006446174 Angelo Coelho

    well, I’m wondered if this computer can handle a 1/3 of power of my new baby….

    http://www.old-computers.com/museum/computer.asp?st=1&c=279

    lol, 

  • http://twitter.com/Pargadox anders salvesen

    Run crysis 3 at 6x 4k resolution screens with about couple of tousands fps i bet.

  • http://www.facebook.com/profile.php?id=100001848520580 Talha Yousuf

    awesome 

  • http://www.facebook.com/mohsen.kiae Mohsen Kiae

    This is what seprates nVidia from other brands 😉

  • Brian_Caulfield

    🙂

  • Brian_Caulfield

    Careers.NVIDIA.com 🙂

  • Brian_Caulfield

    Nice

  • johnboy53

    Does anyone know what OS this computer uses?

  • http://www.facebook.com/stephen.ramos.94 Stephen Ramos

    or maybe run 300 instances of crysis 3 at that resolution ray traced in real time at 60 fps.

  • Radu

    Could someone please donate a decent camera to the nVidia team?

  • Chiburg

    I’m sure they’d like to, but all resources are used for “18000 crysis games running at the same time in full detail at res at 5000*3000 res”. Nothing left for hi-res picts 🙁