How Suite It Is: NVIDIA and VMware Deliver AI-Ready Enterprise Platform

NVIDIA optimizes, certifies and supports NVIDIA AI Enterprise suite for VMware vSphere, bringing scale-out performance and compatibility for AI and data science applications to hybrid clouds.
by Justin Boitano

As enterprises modernize their data centers to power AI-driven applications and data science, NVIDIA and VMware are making it easier than ever to develop and deploy a multitude of different AI workloads in the modern hybrid cloud.

The companies have teamed up to optimize the just-announced update to vSphere — VMware vSphere 7 Update 2 — for AI applications with the NVIDIA AI Enterprise software suite (see Figure 1 below). This combination enables scale-out, multi-node performance and compatibility for a vast set of accelerated CUDA applications, AI frameworks, models and SDKs for the hundreds of thousands of enterprises that use vSphere for server virtualization.

Through this first-of-its-kind industry collaboration, AI researchers, data scientists and developers gain the software they need to deliver successful AI projects, while IT professionals acquire the ability to support AI using the tools they’re most familiar with for managing large-scale data centers, without compromise.

VMware + NVIDIA AI-Ready Platform
Figure 1: NVIDIA AI Enterprise for VMware vSphere runs on NVIDIA-Certified Systems to make it easy for IT to deploy virtualized AI at scale.

One Suite Package for AI Enterprise

NVIDIA AI Enterprise is a comprehensive suite of enterprise-grade AI tools and frameworks that optimize business processes and boost efficiency for a broad range of key industries, including manufacturing, logistics, financial services, retail and healthcare. With NVIDIA AI Enterprise, scientists and AI researchers have easy access to NVIDIA’s leading AI tools to power AI development across projects ranging from advanced diagnostics, smart factories, fraud detection and more.

The solution overcomes the complexity of deploying individual AI applications, as well as the potential failures that can result from having to manually provision and manage different applications and infrastructure software that can often be incompatible.

With NVIDIA AI Enterprise running on vSphere, customers can avoid silos of AI-specific systems that are difficult to manage and secure. They can also mitigate the risks of shadow AI deployments, where data scientists and machine learning engineers procure resources outside of the IT ecosystem.

Licensed by NVIDIA, AI Enterprise for vSphere is supported on NVIDIA-Certified Systems which include mainstream servers from Dell Technologies, HPE, Lenovo and Supermicro. This allows even the most modern, demanding AI applications to be easily supported just like traditional enterprise workloads on a common infrastructure and using data center management tools like VMware vCenter.

IT can manage availability, optimize resource allocation and enable the security of its valuable IP and customer data for AI workloads running on premises and in the hybrid cloud.

Scalable, Multi-Node, Virtualized AI Performance

NVIDIA AI Enterprise enables virtual workloads to run at near bare-metal performance on vSphere with support for the record-breaking performance of NVIDIA A100 GPUs for AI and data science (see Chart 1 below). AI workloads can now scale across multiple nodes, allowing even the largest deep learning training models to run on VMware Cloud Foundation.

AI Enterprise VMware vSphere
Chart 1: With NVIDIA AI Enterprise for vSphere, distributed deep learning training scales linearly, across multiple nodes, and delivers performance that is indistinguishable from bare metal.

AI workloads come in all sizes with a wide variety of data requirements. Some process images, like live traffic reporting systems or online shopping recommender systems. Others are text-based, like a customer service support system powered by conversational AI.

Training an AI model can be incredibly data intensive and requires scale-out performance across multiple GPUs in multiple nodes. Running inference on a model in deployment usually requires fewer computing resources and may not need the power of a whole GPU.

Through the collaboration between NVIDIA and VMware, vSphere is the only server virtualization software to provide hypervisor support for live migration with NVIDIA Multi-Instance GPU technology. With MIG, each A100 GPU can be partitioned into up to seven instances at the hardware level to maximize efficiency for workloads of all sizes.

Extensive Resources for AI Applications and Infrastructure

NVIDIA AI Enterprise includes key technologies and software from NVIDIA for the rapid deployment, management and scaling of AI workloads in virtualized data centers running on VMware Cloud Foundation.

NVIDIA AI Enterprise is a certified, end-to-end suite of key NVIDIA AI technologies and applications as well as enterprise support services.

Customers who would like to adopt NVIDIA AI Enterprise as they upgrade to vSphere 7 U2 can contact NVIDIA and VMware to discuss their needs.

For more information on bringing AI to VMware-based data centers, read the NVIDIA developer blog and the VMware vSphere 7 U2 blog.

To further develop AI expertise with NVIDIA and VMware, register for free for GTC 2021.