Improving Power and Programming: Keys to the Exascale Kingdom

by George Millington

Virtually all those attending this week’s 2013 International Supercomputing Conference (ISC) in Leipzig, Germany, would agree that there’s an insatiable demand driving the race to exascale computing.

Where they would disagree is what it will take to get there.

NVIDIA’s Chief Scientist, Bill Dally, tackled this issue in the ISC conference keynote address he delivered at the big event, entitled “Future Challenges of Large-scale Computing.”

Presenting to some of the high performance computing (HPC) industry’s foremost experts, Dally outlined challenges the industry needs to overcome to reach exascale by the end of this decade.

It boils down, in his view, to two major issues: power and programming.

It’s About Power, Forget Process

Theoretically, an exascale system – 100 times more computing capability than today’s fastest systems – could be built with only x86 processors, but it would require as much as 2 gigawatts of power.

That’s the entire output of the Hoover Dam.

On the other hand, the GPUs in an exascale system built with NVIDIA Kepler K20 processors would consume about 150 megawatts. So, a hybrid system that efficiently utilizes CPUs with higher-performance GPU accelerators is the best bet to tackle the power problem.

Still, the industry needs to look for power efficiencies in other areas.

Reaching exascale, according to Dally, will require a 25x improvement in energy efficiency – 50 gigaflops (or billion floating point operations per second) per watt vs. the 2 gigaflops per watt from today’s most efficient systems.


And, contrary to what some believe, manufacturing process advances alone will not achieve this goal.

At best, this will only deliver about a 2.2x improvement in performance per watt, leaving an energy efficiency gap of 12x that will need to be reached by other means.

Dally believes that a combination of more efficient circuit design and better processor architectures can help close the gap – delivering 3x and 4x improvements in performance per watt, respectively.

Dally’s engineering team at NVIDIA is exploring a number of new approaches, including utilizing hierarchical register files, two-level scheduling, optimizing temporal SIMT, and other advanced techniques – all designed to maximize energy efficiency in every way possible.

Programming is a “Team Sport”

Dally says that second big challenge to overcome is making it easier for developers to program these large-scale systems.

This is not to say that parallel computing is hard. Rather in Dally’s view, parallel programming is easy…but “we make it hard.”

He explained that programmers, programming tools and the architecture each need to ‘play their positions.’


For example, programmers should focus on designing better algorithms – and not worry about optimization or mapping. Leave that to the programming tools, which are much more effective at these types of tasks than humans.

And the architecture – well, it just needs to provide the underlying compute power, and otherwise “stay out of the way.”

On top of this, Dally notes that tools and programming models need to continue improving, making it even easier for programmers to maximize both performance and energy efficiency.

Potential improvements Dally is investigating in this area include using collection-oriented programming methods, which will continue to make the process of programing large-scale machines quicker and easier.

By focusing on these areas, Dally is confident that exascale computing is within our reach by the end of the decade.