NVIDIA and Google share a long-standing relationship rooted in advancing AI innovation and empowering the global developer community. This partnership goes beyond infrastructure, encompassing deep engineering collaboration to optimize the computing stack.
The latest innovations stemming from this partnership include significant contributions to community software efforts like JAX, OpenXLA, MaxText and llm-d. These foundational optimizations directly support serving of Google’s cutting-edge Gemini models and Gemma family of open models.
Additionally, performance-optimized NVIDIA AI software like NVIDIA NeMo, NVIDIA TensorRT-LLM, NVIDIA Dynamo and NVIDIA NIM microservices are tightly integrated across Google Cloud, including Vertex AI, Google Kubernetes Engine (GKE) and Cloud Run, to accelerate performance and simplify AI deployments.
NVIDIA Blackwell in Production on Google Cloud
Google Cloud was the first cloud service provider to offer both NVIDIA HGX B200 and NVIDIA GB200 NVL72 with its A4 and A4X virtual machines (VMs).
These new VMs with Google Cloud’s AI Hypercomputer architecture are accessible through managed services like Vertex AI and GKE, enabling organizations to choose the right path to develop and deploy agentic AI applications at scale. Google Cloud’s A4 VMs, accelerated by NVIDIA HGX B200, are now generally available.
Google Cloud’s A4X VMs deliver over one exaflop of compute per rack and support seamless scaling to tens of thousands of GPUs, enabled by Google’s Jupiter network fabric and advanced networking with NVIDIA ConnectX-7 NICs. Google’s third-generation liquid cooling infrastructure delivers sustained, efficient performance even for the largest AI workloads.
Google Gemini Can Now Be Deployed On-Premises With NVIDIA Blackwell on Google Distributed Cloud
Gemini’s advanced reasoning capabilities are already powering cloud-based agentic AI applications — however, some customers in public sector, healthcare and financial services with strict data residency, regulatory or security requirements have yet been unable to tap into the technology.
With NVIDIA Blackwell platforms coming to Google Distributed Cloud — Google Cloud’s fully managed solution for on-premises, air-gapped environments and edge — organizations will now be able to deploy Gemini models securely within their own data centers, unlocking agentic AI for these customers
NVIDIA Blackwell’s unique combination of breakthrough performance and confidential computing capabilities makes this possible — ensuring that user prompts and fine-tuning data remain protected. This enables customers to innovate with Gemini while maintaining full control over their information, meeting the highest standards of privacy and compliance. Google Distributed Cloud expands the reach of Gemini, empowering more organizations than ever to tap into next-generation agentic AI.
Optimizing AI Inference Performance for Google Gemini and Gemma
Designed for the agentic era, the Gemini family of models represent Google’s most advanced and versatile AI models to date, excelling at complex reasoning, coding and multimodal understanding.
NVIDIA and Google have worked on performance optimizations to ensure that Gemini-based inference workloads run efficiently on NVIDIA GPUs, particularly within Google Cloud’s Vertex AI platform. This enables Google to serve a significant amount of user queries for Gemini models on NVIDIA-accelerated infrastructure across Vertex AI and Google Distributed Cloud.
In addition, the Gemma family of lightweight, open models have been optimized for inference using the NVIDIA TensorRT-LLM library and are expected to be offered as easy-to-deploy NVIDIA NIM microservices. These optimizations maximize performance and make advanced AI more accessible to developers to run their workloads on various deployment architectures across data centers to local NVIDIA RTX-powered PCs and workstations.
Building a Strong Developer Community and Ecosystem
NVIDIA and Google Cloud are also supporting the developer community by optimizing open-source frameworks like JAX for seamless scaling and breakthrough performance on Blackwell GPUs — enabling AI workloads to run efficiently across tens of thousands of nodes.
The collaboration extends beyond technology, with the launch of a new joint Google Cloud and NVIDIA developer community that brings experts and peers together to accelerate cross-skilling and innovation.
By combining engineering excellence, open-source leadership and a vibrant developer ecosystem, the companies are making it easier than ever for developers to build, scale and deploy the next generation of AI applications.
See notice regarding software product information.