NVIDIA Zero-Touch RoCE Technology Enables Cloud Economics for Microsoft Azure Stack HCI

NVIDIA accelerates RoCE for the world’s most advanced cloud platforms; zero-touch RoCE technology supercharges new Azure Stack HCI performance.
by Kevin Deierling

Throughout the data center, RDMA over Converged Ethernet (RoCE) networking technology has emerged as an effective hyperscale cloud strategy to streamline infrastructure efficiency and reduce costs.

With Microsoft announcing general availability of its Azure Stack HCI platform, which supports NVIDIA’s zero-touch RoCE (ZTR) technology, enterprises can now benefit from the same levels of performance and efficiency of the full-scale Azure service in their own data centers.

The new platform is designed as an Azure service that has the price-performance of hyperconverged infrastructure with native Azure hybrid capabilities. Customers get the latest security, performance and hybrid cloud enhancements, with an integrated management and operations experience from the Azure portal, all while being able to take advantage of their existing skills.

The breakthrough ZTR technology embedded in NVIDIA Mellanox ConnectX SmartNICs and BlueField data processing units (DPUs) removes the barrier for enterprises to deploy RoCE in on-premises data centers.

NVIDIA ZTR enables seamless, pervasive deployment and operation of RoCE network transport capabilities. It eliminates the need for special network configurations such as flow control or congestion notification, enabling RoCE to easily be deployed with zero network changes in new and existing environments. With ZTR, RoCE network transport services operate side by side with non-RoCE communications in ordinary TCP/IP environments.

Paired with NVIDIA EGX-certified servers and NVIDIA Mellanox end-to-end networking, Azure HCI enables enterprises to benefit from hyperscale efficiency from cloud, to core data centers, to edge.

RoCE Accelerates Compute-Intensive Workloads

Initially, RoCE was deployed in small data center silos for accelerating data storage platforms. But the exponential growth in data analytics, machine learning and AI has fueled the adoption of RoCE networking everywhere for accelerating diverse compute-intensive workloads.

Now deployed in the world’s most advanced data centers, RoCE has broken out of its traditional silos and become ubiquitous in cloud and web-scale data centers to accelerate a broad range of compute and storage workloads.

Microsoft was the vanguard among the cloud titans to demonstrate a RoCE-everywhere mindset, deploying RDMA across massive computing clusters to accelerate software-defined storage, AI and HPC customer workloads.

NVIDIA is one of the leading providers of RDMA/RoCE network transport technologies, starting in the high performance computing industry, then expanding its use in storage systems, AI and data science.

Announced at GTC earlier this year, ConnectX-6 Lx SmartNICs are NVIDIA’s 11th generation RoCE-capable products, including ConnectX SmartNICs and BlueField DPUs, which deliver unmatched performance and usability. Adding in NVIDIA Spectrum Ethernet Switches and LinkX cables creates an end-to-end, scalable networking solution which provides high bandwidth, low latency and simplified management.

RoCE is integrated into the mainstream code of popular ML/AI and data analytics frameworks, including TensorFlow, Apache Spark and PyTorch. RoCE support in these open-source frameworks enables ML/AI-powered applications to benefit from the predictable and scalable performance that RDMA delivers.

GPUDirect RDMA technology is key to unlocking unparalleled performance of ML/AI workloads, in data-center-scale computing clusters. The hardware acceleration engines in the networking ASIC enable GPUDirect RDMA to perform efficient zero-copy data transfer between nodes, keeping GPUs constantly fed with the data needed to perform AI computing.

The rise of Kubernetes for ML/AI multi-node training and edge inferencing has also spurred the growing adoption of RoCE for accelerating compute-intensive workloads.

NGC, NVIDIA’s hub for GPU-optimized containerized software, hosts a wide range of data science frameworks with native support for UCX, a production-grade, communication framework powered by RDMA/RoCE. NGC helps accelerate productivity with easy-to-deploy frameworks and applications, so users can focus on building their solutions.

The pioneering ZTR technology makes RoCE the easy-to-use network transport technology of choice for accelerating any cloud or enterprise computing workload.

Azure Stack HCI Nabs NVIDIA ZTR to Extend Azure Cloud Economics

By facilitating high-throughput, low-latency, node-to-node connectivity, NVIDIA ZTR accelerates Azure Stack HCI performance in bringing Azure services to on-premises, enterprise environments.

NVIDIA ZTR helps ensure consistent application performance within Azure Stack HCI at every scale and physical location — from small deployments at branch offices to entire data centers.

“Microsoft Azure Stack HCI builds on Azure with an enterprise-grade, cost-optimized design,” said Talal Alqinawi, senior director of Azure Marketing at Microsoft Corp. “NVIDIA Mellanox ConnectX SmartNICs complement our vision to bring full-scale Azure cloud economics to enterprise data centers for hybrid cloud architecture.”

Get Started with NVIDIA ZTR Technology and Microsoft Azure Stack HCI

NVIDIA ZTR is available now for ConnectX-4 Lx, ConnectX-5, ConnectX-6, ConnectX-6 Dx and ConnectX-6 Lx SmartNIC devices running the latest firmware and software. NVIDIA ZTR for the recently announced family of BlueField-2 DPUs and EGX converged accelerator will be available next year.

NVIDIA and leading hardware providers offer a broad range of validated solutions featuring Azure Stack HCI together with NVIDIA Mellanox networking. These validated solutions are based on standardized reference architecture that are supported by Microsoft, NVIDIA and our hardware partners.

To learn more, check out these NVIDIA Mellanox Zero-Touch RoCE resources: