GeForce GTX TITAN: Why Best Gaming GPU is also Ultimate CUDA Development GPU

by Roy Kim

Reviews are in for the newly launched GeForce GTX TITAN: “Awe-inspiring performance” (AnandTech), “Beautifully engineered product” (TechRadar), “Head, shoulders and knees above its compatriots” (HardwareCanucks).

So, gamers everywhere will shove and fight to be first in line to get a hold of one. That is, unless CUDA developers beat them to it.

As it happens, GTX TITAN – based on the same Kepler chip that powers the world’s fastest supercomputer, the Titan system at the Oak Ridge National Laboratory – is the ultimate CUDA development GPU.

Basically, we’ve unleashed the best of Kepler’s compute capabilities in GTX TITAN.


1.3 Teraflops for Under $1,000

For the first time, GTX TITAN provides access to developers to over a teraflop of double-precision performance in a commercially-available GPU, transforming their PCs into personal supercomputers.  That’s big news: for scientists, accessibility to computing resources is one of the biggest hurdles in advancing research.  Many have to wait weeks to months for access to a supercomputer or a campus-wide cluster.


But no longer.  Now GTX TITAN can simply be added to a PC for an 8x boost in computational capability. It also delivers 5x more double precision performance than the next-best consumer GPU, GeForce GTX 680. And it’s widely available through e-tailers, retailers and resellers everywhere.

CUDA Made Easier with Dynamic Parallelism

GTX TITAN is the perfect GPU for developers who have yet to dive into CUDA, with features like Dynamic Parallelism, which enables the GPU to operate more autonomously from the CPU by generating new work for itself at run-time. Eliminating unnecessary interaction with the CPU, it makes GPU programming easier, particularly for algorithms traditionally considered difficult for GPUs, such as divide-and-conquer problems.


A great example of the power of Dynamic Parallelism is the Quicksort example we wrote about in a blog post a few months ago.  With Dynamic Parlelism, the popular Quicksort algorithm can be implemented in half the lines of code as before, and the end code looks basically the same as the CPU version of the algorithm.

Develop with GeForce, Deploy with Tesla

While GTX TITAN is designed to be installed into PCs, NVIDIA’s Tesla K20 GPU accelerators are purpose-built for workstations, servers and large supercomputers like Oak Ridge’s Titan system. Tesla accelerators deliver the best cluster performance while jobs complete with 100-percent reliability and manageability.  Some Tesla-exclusive features include:

  • NVIDIA GPUDirect RDMA for InfiniBand performance
  • Hyper-Q for MPI (Hyper-Q for CUDA Streams is supported on GeForce GTX TITAN)
  • ECC protection for all internal and external registers and memories
  • Supported tools for GPU and cluster management, such as Bright Computing, Ganglia.

The great thing is that developers now have the best of both worlds. They can design and optimize their applications in an environment closely resembling future deployments, but on their desktop PCs with GTX TITAN. And later, they can deploy and scale their applications on Tesla-based systems.

Try GeForce GTX TITAN Today

GeForce GTX TITAN is a game-changer for developers.  Whether you’re a CUDA novice or ninja, try GTX TITAN today. Its Kepler features will blow you away. Everything you need to develop CUDA is freely available at  For beginners, Udacity’s “Intro to Parallel Programming” open online course is a great starting point.

If you are a CUDA developer, tell us below about the potential impact of GTX TITAN for your research.

And if you want to learn more about how GPUs are revolutionizing innovation and discovery in science, engineering and industry, be sure to attend our GPU Technology Conference next week in San Jose, Calif.