Microsoft Azure Announces General Availability of NVIDIA A100 GPU VMs

by Ian Buck

Microsoft Azure has announced the general availability of the ND A100 v4 VM series, their most powerful virtual machines for supercomputer-class AI and HPC workloads, powered by NVIDIA A100 Tensor Core GPUs and NVIDIA HDR InfiniBand.

NVIDIA collaborated with Azure to architect this new scale-up and scale-out AI platform, which brings together groundbreaking NVIDIA Ampere architecture GPUs, NVIDIA networking technology and the power of Azure’s high-performance interconnect and virtual machine fabric to make AI supercomputing accessible to everyone.

When solving grand challenges in AI and HPC, scale is everything. Natural language processing, recommendation systems, healthcare research, drug discovery and energy, among other areas, have all seen tremendous progress enabled by accelerated computing.

Much of that progress has come from applications operating at massive scale. To accelerate this trend, applications need to run on architecture that is flexible, accessible and can both scale up and scale out.

The ND A100 v4 VM brings together eight NVIDIA A100 GPUs in a single VM with the NVIDIA HDR InfiniBand that enables 200Gb/s data bandwidth per GPU. That’s a massive 1.6 Tb/s of interconnect bandwidth per VM.

And, for the most demanding AI and HPC workloads, these can be further scaled out to thousands of NVIDIA A100 GPUs under the same low-latency InfiniBand fabric, delivering both the compute and networking capabilities for multi-node distributed computing.

Ready for Developers

Developers have multiple options to get the most performance out of the NVIDIA A100 GPUs in the ND A100 v4 VM for their applications, both for application development and managing infrastructure once those applications are deployed.

To simplify and speed up development, the NVIDIA NGC catalog offers ready-to-use GPU-optimized application frameworks, containers, pre-trained models, libraries, SDKs and Helm charts. With the prebuilt NVIDIA GPU-optimized Image for AI and HPC on the Azure Marketplace, developers can get started with GPU-accelerated software from the NGC catalog with just a few clicks.

The ND A100 v4 VMs are also supported in the Azure Machine Learning service for  interactive AI development, distributed training, batch inferencing and automation with ML Ops.

Deploying machine learning pipelines in production with ND A100 v4 VMs is further simplified using the NVIDIA Triton Inference Server, an open-source inference serving application that’s integrated with Azure ML to maximize both GPU and CPU performance and utilization to help minimize the operational costs of deployment.

Developers and infrastructure managers will soon also be able to use Azure Kubernetes Service, a fully managed Kubernetes service to deploy and manage containerized applications on the ND A100 v4 VMs, with NVIDIA A100 GPUs.

Learn more about the ND A100 v4 VMs on Microsoft Azure and get started with building innovative solutions on the cloud.

For more, watch the GTC21 talk I co-presented on “Azure: Empowering the World with High-Ambition AI and HPC” with Girish Bablani, corporate vice president of Microsoft.