NVIDIA’s Next-Gen Pascal GPU Architecture to Provide 10x Speedup for Deep Learning Apps

by Ian Buck

NVIDIA’s Pascal GPU architecture, set to debut next year, will accelerate deep learning applications 10X beyond the speed of its current-generation Maxwell processors.

NVIDIA CEO and co-founder Jen-Hsun Huang revealed details of Pascal and the company’s updated processor roadmap in front of a crowd of 4,000 during his keynote address at the GPU Technology Conference, in Silicon Valley.

“It will benefit from a billion dollars worth of refinement because of R&D done over the last three years,” he told the audience.

The rise of deep learning – the process by which computers use neural networks to teach themselves — led NVIDIA to evolve the design of Pascal, which was originally announced at last year’s GTC.

Read NVIDIA CEO Jen-Hsun Huang’s latest blog post to learn more about how deep learning, and GPUs, are changing the world

Pascal GPUs will have three key design features that will result in dramatically faster, more accurate training of richer deep neural networks – the human cortex-like data structures that serve as the foundation of deep learning research.

Along with up to 32GB of memory — 2.7X more than the newly launched NVIDIA flagship, the GeForce GTX TITAN X — Pascal will feature mixed-precision computing. It will have 3D memory, resulting in up to 5X improvement in deep learning applications.  And it will feature NVLink — NVIDIA’s high-speed interconnect, which links together two or more GPUs — that will lead to a total 10X improvement in deep learning.

Pascal will offer better performance than Maxwell on key deep-learning tasks.

Mixed-Precision Computing for Greater Accuracy

Mixed-precision computing enables Pascal architecture-based GPUs to compute at 16-bit floating point accuracy at twice the rate of 32-bit floating point accuracy.

Increased floating point performance particularly benefits classification and convolution — two key activities in deep learning — while achieving needed accuracy.

3D Memory for Faster Communication Speed and Power Efficiency

Memory bandwidth constraints limit the speed at which data can be delivered to the GPU. The introduction of 3D memory will provide 3X the bandwidth performance of Maxwell. This will let developers build even larger neural networks and accelerate the bandwidth-intensive portions of deep learning training.

Pascal will have its memory chips stacked on top of each other, and placed adjacent to the GPU, rather than further down the processor boards. This reduces from inches to millimeters the distance that bits need to travel as they traverse from memory to GPU and back. The result is dramatically accelerated communication and improved power efficiency.

NVLink – for Faster Data Movement

The addition of NVLink to Pascal will let data move between GPUs and CPUs five to 12 times faster than they can with today’s current standard, PCI-Express. This is greatly benefits applications, such as deep learning, that have high inter-GPU communication needs.

NVLink allows for double the number of GPUs in a system to work together in deep learning computations. In addition, CPUs and GPUs can connect in new ways to enable more flexibility and energy efficiency in server design compared to PCI-E.

Broadcast live streaming video on Ustream