Tesla K80 Is All About Instant Gratification, Early Users Say

by Geetika Gupta

Why wait for tomorrow to get results when you can have them today? Why do just one job a day when our new Tesla K80 can run twice that number?

We’ve given researchers with some of the most demanding jobs around early access to the Tesla K80 dual-GPU accelerator.

Here’s what they’re telling us: Tesla K80 is shrinking the time to discovery and insights in high performance computing (HPC). Waiting for CPUs and other accelerators slows the pace of innovation when you can discover solutions on the Tesla K80 sooner than before.

The Tesla K80 packs so much compute power and memory bandwidth that early adopters are doing twice the work in a given day. Enclosed within a standard dual-slot peripheral component interconnect accelerator are 4,992 CUDA cores waiting to start working on 24GB of data at a transfer rate of 480 GBps.

These cores have NVIDIA GPU Boost enabled by default. So they can quickly, without external intervention, go from 560 MHz to 875 MHz as soon as they detect a workload isn’t consuming all 300W available.

That’s not the only way the K80 is built for speed. GPU Boost is always on by default and dynamically adjusts to applications. That means fewer steps for anyone looking to maximize application performance and get the results fast.


Just talk to users like Wolfgang Nagel, director of the Center for Information Services and High Performance Computing, TU Dresden, and Yann LeCun, director of AI Research at Facebook and professor at New York University.

The Tesla K80 dual-GPU accelerators are up to 10 times faster than CPUs when enabling scientific breakthroughs in some of our key applications, and provide a low energy footprint. Our researchers use the available GPU resources on the Taurus supercomputer extensively to enable a more refined cancer therapy, understand cells by watching them live, and study asteroids as part of ESA’s Rosetta mission.” Wolfgang Nagel, director of the Center for Information Services and High Performance Computing, TU Dresden.

“NVIDIA GPUs have become the de facto computing platform for the deep learning community. Because the accuracy of deep learning systems improves as the models and datasets get larger, we always look for the fastest hardware we can find. The Tesla K80 accelerator, with its dual-GPU architecture and large memory, gives us more teraflops and more GB than ever before from a single server, allowing us to make faster progress in deep learning.” Yann LeCun, director of AI Research at Facebook and professor at New York University.

Tesla K10, was the first dual-GPU accelerator designed for HPC. In the oil and gas industry, K10 made great strides with its single precision performance and high-memory bandwidth. It allowed an easier path for increasing the ratio of GPUs to CPUs within a node. Two years since K10, high GPU density within a node has gained momentum. With a CPU:GPU ratio of 1:4 or higher, slicing and splitting a problem set is easier and one can save on extra cabling and interconnects.

K80 is a great successor to K10 for workloads that benefit from higher GPU density and memory bandwidth across a vast range of applications. Next year will see a large number of servers with high GPU density. The K80 will make it easier, quicker and cheaper to get results on a single system with four or more GPUs, versus hooking up many systems with one or two GPUs, users say.

With two GPUs on a single board, the K80 gives the user flexibility to either use both GPUs on the board to maximize throughput for a single simulation, or use them independently to maximize aggregate simulation. Eight of these cards in one system combine 16 GPUs in a node – that’s over 3.2 microseconds aggregate of MD per day for a 25K atom system – in a single node!” –Ross Walker, associate research professor and high-performance computing consultant, San Diego Supercomputer Center and AMBER Development Team lead.

A single K80 gives the most throughput ever seen for a single card.

Inside the K80 are two GK210 GPUs. GK210 is based on our Kepler GPU architecture. But it expands on-chip resources, doubling the available register file and shared memory capacities per SMX. With a higher number of registers and larger shared memory, GPUs are busy longer. This reduces the back and forth between GPU and external memory, improving efficiency and application performance.

From a developer point of view these changes are pretty transparent and can be harnessed via compiler flags, early customers like Xcelerit CEO Hicham Lahlou say.

“Xcelerit has tested the new K80 and found that for many of our financial customers, it can achieve a straight 2x speed-up over its predecessor, the K40. We have added it to our supported platforms, so customers can migrate their Xcelerit-enabled applications to this new hardware with absolutely no changes to the codebase.” Hicham Lahlou, Xcelerit CEO and co-founder.

Tesla K80 is a high-performance, cost-effective way to increase GPU density and ease of use. It’s a combination that shrinks the time to discovery.

Try a Tesla K80 GPU Today in the Cloud

K80 is available now. Buy one today from one of NVIDIA’s system partners. Or you can also try a K80 in the GPU Test Drive from one of our Tesla preferred partners.

Publish your benchmarks. Already have a Tesla K80 or have access to one? Please share your results in the comments section, below.