As NVIDIA continues to collaborate with Microsoft to build state-of-the-art AI infrastructure, Microsoft is introducing additional H100-based virtual machines to Microsoft Azure to accelerate demanding AI workloads.
At its Ignite conference in Seattle today, Microsoft announced its new NC H100 v5 VM series for Azure, the industry’s first cloud instances featuring NVIDIA H100 NVL GPUs.
This offering brings together a pair of PCIe-based H100 GPUs connected via NVIDIA NVLink, with nearly 4 petaflops of AI compute and 188GB of faster HBM3 memory. The NVIDIA H100 NVL GPU can deliver up to 12x higher performance on GPT-3 175B over the previous generation and is ideal for inference and mainstream training workloads.
Additionally, Microsoft announced plans to add the NVIDIA H200 Tensor Core GPU to its Azure fleet next year to support larger model inferencing with no increase in latency. This new offering is purpose-built to accelerate the largest AI workloads, including LLMs and generative AI models.
The H200 GPU brings dramatic increases both in memory capacity and bandwidth using the latest-generation HBM3e memory. Compared to the H100, this new GPU will offer 141GB of HBM3e memory (1.8x more) and 4.8 TB/s of peak memory bandwidth (a 1.4x increase).
Cloud Computing Gets Confidential
Further expanding availability of NVIDIA-accelerated generative AI computing for Azure customers, Microsoft announced another NVIDIA-powered instance: the NCC H100 v5.
These Azure confidential VMs with NVIDIA H100 Tensor Core GPUs allow customers to protect the confidentiality and integrity of their data and applications in use, in memory, while accessing the unsurpassed acceleration of H100 GPUs. These GPU-enhanced confidential VMs will be coming soon to private preview.
To learn more about the new confidential VMs with NVIDIA H100 Tensor Core GPUs, and sign up for the preview, read the blog.
Learn more about NVIDIA-powered Azure instances on the GPU VM information page.
Explore generative AI sessions and experiences at NVIDIA GTC, the global conference on AI and accelerated computing, running March 18-21 in San Jose, Calif., and online.