Six Years Later, the Supercharged Tesla GPU I Always Wanted

by Sumit Gupta

When we launched our first Tesla GPU accelerator product in 2007, our users started asking for one big change – more memory on the GPU board.   In fact, they wanted more than 10GB of fast (GDDR5) memory feeding the GPU.    Users needed more memory to operate on large data sets, which are very common in high performance computing and data analytics.

Our first product, the Tesla C870 had just 1.5GB, so at the time getting to 10 times more memory seemed like a dream.

Well, after 6 years, the dream has come true.   This week, we launched the Tesla K40 GPU accelerator.   This accelerator is based on the Kepler architecture and has 12GB of GDDR5 memory.


Faster, Larger, Smarter

The three defining features of the K40 over the previous Tesla flagship product, the Tesla K20X, are:

  • Faster performance:  1.43 teraflops double precision and 4.29 teraflops single precision (3x double).
  • Twice the memory: Jumps from 6GB on K20X to 12GB on K40
  • GPU Boost: A terrific new performance enhancing feature that harvests power headroom to give applications an extra performance boost.

GPU Boost

We design our GPU accelerator boards to never exceed 235 watts when running any application.  We set the clocks for the CUDA cores in the GPU by running a synthetic power-hungry mini-application so that it never exceeds 235 watts.  Our server manufacturer partners design their servers to be able to cool GPUs running at maximum power.

We found, however, that most real applications consume only 160 to 180 watts.  This gave us 50 to 70 watts of power headroom that we took advantage of by inventing the GPU Boost feature.

The way GPU Boost works is that you can execute your application on the GPU and check how much power it is consuming (using simple command line tools). If an application is under 235 watts, then you can set the clocks for the CUDA cores to one of two higher boost clocks.  This runs all the CUDA cores at the new higher boost clock.   Simply check for power again, and if under 235 watts, you are good to go and your application is now taking advantage of the higher boost clocks.

We found that GPU Boost gave most applications between 10 to 25% extra performance, as shown in the image below.  Note in some cases, the higher CUDA core clocks leads to higher effective memory bandwidth.


Real Application Performance

Overall, with GPU Boost on, the Tesla K40 is between 20 to 40% faster than K20X on most applications as shown below.


Try a Tesla K40 GPU Today in the Cloud

K40 is available now. You can buy one today from one of NVIDIA’s system partners.  Or you can also try a K40 in the GPU Test Drive from one of our Tesla preferred partners.

Publish Your Benchmarks Below

Already have a Tesla K40 or have access to one?   Please share your results.