To help developers meet the growing complexity of deep learning, NVIDIA today announced better and faster tools for our software development community. This includes a significant update to the NVIDIA SDK, which includes software libraries and tools for developers building AI-powered applications.
With each new generation of GPU architecture, we’ve continually improved the NVIDIA SDK. Keeping with that heritage, our software is Volta-ready.
Aided by developers’ requests, we’ve built tools, libraries and enhancements to the CUDA programming model to help developers accelerate and build the next generation of AI and HPC applications.
The latest SDK updates introduce new capabilities and performance optimizations for GPU-accelerated applications:
- New CUDA 9 speeds up HPC and deep learning applications with support for Volta GPUs, up to 5x faster performance for libraries, a new programming model for thread management, and updates to debugging and profiling tools.
- Developers of end-user applications such as AI-powered web services and embedded edge devices benefit from 3.5x faster deep learning inference with the new TensorRT 3. With built-in support for optimizing both Caffe and TensorFlow models, developers can take trained neural networks to production faster than ever.
- Engineers and data scientists can benefit from 2.5x faster deep learning training using Volta optimizations for frameworks such as Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and TensorFlow.
Here’s a detailed look at each of the software updates and the benefits they bring to developers and end users:
CUDA is the fastest software development platform for creating GPU-accelerated applications. Every new generation of GPU is accompanied by a major update of CUDA, and version 9 includes support for Volta GPUs, major updates to libraries, a new programming model, and updates to debugging and profiling tools.
NVIDIA Deep Learning SDK
With the updated Deep Learning SDK optimized for Volta, developers have access to the libraries and tools that ensure seamless development and deployment of deep neural networks on all NVIDIA platforms, from the cloud or data center to the desktop to embedded edge devices. Deep learning frameworks using the latest updates deliver up to 2.5x faster training of CNNs, 3x faster training of RNNs and 3.5x faster inference on Volta GPUs compared to Pascal GPUs.
We’ve also worked with our partners and the communities so the Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and TensorFlow deep learning frameworks will be updated take advantage of the latest Deep Learning SDK and Volta.
This update brings performance improvements and new features to:
NVIDIA cuDNN provides high-performance building blocks for deep learning and is used by all the leading deep learning frameworks.
cuDNN 7 delivers 2.5x faster training of Microsoft’s ResNet50 neural network on the Volta-optimized Caffe2 deep learning framework. Apache MXNet delivers 3x faster training of OpenNMT language translation LSTM RNNs.
Deep learning frameworks rely on NCCL to deliver multi-GPU scaling of deep learning workloads. NCCL 2 introduces multi-node scaling of deep learning training on up to eight GPU-accelerated servers. With the time required to train a neural network reduced from days to hours, developers can iterate and develop their products faster.
Developers of HPC applications and deep learning frameworks will have access to NCCL 2 in July. It will be available as a free download for members of the NVIDIA Developer Program. Learn more at the NCCL website.
Delivering AI services in real time poses stringent latency requirements for deep learning inference. With NVIDIA TensorRT 3, developers can now deliver 3.5x faster inference performance — under 7 ms real-time latency.
Developers can optimize models trained in TensorFlow or Caffe deep learning frameworks and deploy fast AI services to platforms running Linux,
Microsoft Windows, BlackBerry QNX or Android operating systems. [Editor’s note: Windows not currently supported.]
TensorRT 3 will be available in July as a free download for members of the NVIDIA Developer Program. Learn more at the TensorRT website.
DIGITS introduces support for the TensorFlow deep learning framework. Engineers and data scientists can improve productivity by designing TensorFlow models within DIGITS and using its interactive workflow to manage datasets, training and monitor model accuracy in real time. To decrease training time and improve accuracy, the update also provides three new pre-trained models in the DIGITS Model Store: Oxford’s VGG-16 and Microsoft’s ResNet50, for image classification tasks, and NVIDIA DetectNet for object detection tasks.
DIGITS update with TensorFlow and new models will be available for the desktop and the cloud in July as a free download for members of the NVIDIA Developer Program. Learn more at the DIGITS website.
Deep Learning Frameworks
The NVIDIA Deep Learning SDK accelerates widely used deep learning frameworks such as Caffe, Microsoft Cognitive Toolkit, TensorFlow, Theano and Torch as well as many other deep learning applications. NVIDIA is working closely with leading deep learning frameworks maintainers at Amazon, Facebook, Google, Microsoft, University of Oxford and others to integrate the latest NVIDIA Deep Learning SDK libraries and immediately take advantage of the power of Volta.
Caffe2 announced on their blog an update to the framework that brings 16-bit floating point (FP16) training to Volta, developed in collaboration with NVIDIA:
“We are working closely with NVIDIA on Caffe2 to utilize the features in NVIDIA’s upcoming Tesla V100, based on the next-generation Volta architecture. Caffe2 is excited to be one of the first frameworks that is designed from the ground up to take full advantage of Volta by integrating the latest NVIDIA Deep Learning SDK libraries — NCCL and cuDNN.”
Amazon announced how they are working together with NVIDIA to bring high-performance deep learning to AWS. As part of the announcement, they spoke about the work we’ve done together on bringing Volta support to MXNet.
“In collaboration with NVIDIA, AWS engineers and researchers have pre-optimized neural machine translation (NMT) algorithms on Apache MXNet allowing developers to train the fastest on Volta-based platforms,” wrote Joseph Spisak, manager of Product Management at Amazon AI.
Google shared the latest TensorFlow benchmarks on DGX-1 on their developers blog:
“We’d like to thank NVIDIA for sharing a DGX-1 for benchmark testing and for their technical assistance. We’re looking forward to NVIDIA’s upcoming Volta architecture, and to working closely with them to optimize TensorFlow’s performance there, and to expand support for FP16.”
Microsoft Cognitive Toolkit
At NVIDIA, we’re also working closely with Microsoft to optimize Microsoft Cognitive Toolkit on Volta.
We’re also working with Facebook AI Research (FAIR) to optimize PyTorch on Volta.
NVIDIA GPU Cloud Deep Learning Stack
We also announced today NVIDIA GPU Cloud (NGC), a GPU-accelerated cloud platform optimized for deep learning.
NGC is designed for developers of deep learning-powered applications who don’t want to assemble and maintain the latest deep learning software and GPUs. This comes with NGC Deep Learning Stack, a complete development environment that will run on PC, DGX and the cloud, and is powered by the latest deep learning frameworks, NVIDIA Deep Learning SDK and CUDA. The stack is fully managed by NVIDIA so developers and data scientists can start with a single GPU on a PC and scale up to additional compute resource on the cloud.
Updates to NVIDIA VRWorks and DesignWorks
Learn more information about the latest updates to some of our other SDKs: