Tesla K80 Is All About Instant Gratification, Early Users SayNovember 18, 2014
Why wait for tomorrow to get results when you can have them today? Why do just one job a day when our new Tesla K80 can run twice that number?
We’ve given researchers with some of the most demanding jobs around early access to the Tesla K80 dual-GPU accelerator.
Here’s what they’re telling us: Tesla K80 is shrinking the time to discovery and insights in high performance computing (HPC). Waiting for CPUs and other accelerators slows the pace of innovation when you can discover solutions on the Tesla K80 sooner than before.
The Tesla K80 packs so much compute power and memory bandwidth that early adopters are doing twice the work in a given day. Enclosed within a standard dual-slot peripheral component interconnect accelerator are 4,992 CUDA cores waiting to start working on 24GB of data at a transfer rate of 480 GBps.
These cores have NVIDIA GPU Boost enabled by default. So they can quickly, without external intervention, go from 560 MHz to 875 MHz as soon as they detect a workload isn’t consuming all 300W available.
That’s not the only way the K80 is built for speed. GPU Boost is always on by default and dynamically adjusts to applications. That means fewer steps for anyone looking to maximize application performance and get the results fast.
Just talk to users like Wolfgang Nagel, director of the Center for Information Services and High Performance Computing, TU Dresden, and Yann LeCun, director of AI Research at Facebook and professor at New York University.
Tesla K10, was the first dual-GPU accelerator designed for HPC. In the oil and gas industry, K10 made great strides with its single precision performance and high-memory bandwidth. It allowed an easier path for increasing the ratio of GPUs to CPUs within a node. Two years since K10, high GPU density within a node has gained momentum. With a CPU:GPU ratio of 1:4 or higher, slicing and splitting a problem set is easier and one can save on extra cabling and interconnects.
K80 is a great successor to K10 for workloads that benefit from higher GPU density and memory bandwidth across a vast range of applications. Next year will see a large number of servers with high GPU density. The K80 will make it easier, quicker and cheaper to get results on a single system with four or more GPUs, versus hooking up many systems with one or two GPUs, users say.
A single K80 gives the most throughput ever seen for a single card.
Inside the K80 are two GK210 GPUs. GK210 is based on our Kepler GPU architecture. But it expands on-chip resources, doubling the available register file and shared memory capacities per SMX. With a higher number of registers and larger shared memory, GPUs are busy longer. This reduces the back and forth between GPU and external memory, improving efficiency and application performance.
From a developer point of view these changes are pretty transparent and can be harnessed via compiler flags, early customers like Xcelerit CEO Hicham Lahlou say.
Tesla K80 is a high-performance, cost-effective way to increase GPU density and ease of use. It’s a combination that shrinks the time to discovery.
Try a Tesla K80 GPU Today in the Cloud
Publish your benchmarks. Already have a Tesla K80 or have access to one? Please share your results in the comments section, below.