Start Up Your Engines: NVIDIA and Google Cloud Collaborate to Accelerate AI Development

Thousands of startups to get help to rapidly build generative AI applications and services.
by Greg Estes

NVIDIA and Google Cloud have announced a new collaboration to help startups around the world accelerate the creation of generative AI applications and services.

The announcement, made today at Google Cloud Next ‘24 in Las Vegas, brings together the NVIDIA Inception program for startups and the Google for Startups Cloud Program to widen access to cloud credits, go-to-market support and technical expertise to help startups deliver value to customers faster.

Qualified members of NVIDIA Inception, a global program supporting more than 18,000 startups, will have an accelerated path to using Google Cloud infrastructure with access to Google Cloud credits — up to $350,000 for those focused on AI.

Google for Startups Cloud Program members can join NVIDIA Inception and gain access to technological expertise, NVIDIA Deep Learning Institute course credits, NVIDIA hardware and software, and more. Eligible members of the Google for Startups Cloud Program also can participate in NVIDIA Inception Capital Connect, a platform that gives startups exposure to venture capital firms interested in the space.

High-growth emerging software makers of both programs can also gain fast-tracked onboarding to Google Cloud Marketplace, co-marketing and product acceleration support.

This collaboration is the latest in a series of announcements the two companies have made to help ease the costs and barriers associated with developing generative AI applications for enterprises of all sizes. Startups in particular are constrained by the high costs associated with AI investments.

It Takes a Full-Stack AI Platform

In February, Google DeepMind unveiled Gemma, a family of state-of-the-art open models. NVIDIA, in collaboration with Google, recently launched optimizations across all NVIDIA AI platforms for Gemma, helping to reduce customer costs and speed up innovative work for domain-specific use cases.

Teams from the companies worked closely together to accelerate the performance of Gemma — built from the same research and technology used to create Google DeepMind’s most capable model yet, Gemini — with NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, when running on NVIDIA GPUs.

NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform, together with Google Kubernetes Engine (GKE) provide a streamlined path for developing AI-powered apps and deploying optimized AI models into production. Built on inference engines including NVIDIA Triton Inference Server and TensorRT-LLM, NIM supports a wide range of leading AI models and delivers seamless, scalable AI inferencing to accelerate generative AI deployment in enterprises.

The Gemma family of models, including Gemma 7B, RecurrentGemma and CodeGemma, are available from the NVIDIA API catalog for users to try from a browser, prototype with the API endpoints and self-host with NIM.

Google Cloud has made it easier to deploy the NVIDIA NeMo framework across its platform via GKE and Google Cloud HPC Toolkit. This enables developers to automate and scale the training and serving of generative AI models, allowing them to rapidly deploy turnkey environments through customizable blueprints that jump-start the development process.

NVIDIA NeMo, part of NVIDIA AI Enterprise, is also available in Google Cloud Marketplace, providing customers another way to easily access NeMo and other frameworks to accelerate AI development.

Further widening the availability of NVIDIA-accelerated generative AI computing, Google Cloud also announced the general availability of A3 Mega will be coming next month. The instances are an expansion to its A3 virtual machine family, powered by NVIDIA H100 Tensor Core GPUs. The new instances will double the GPU-to-GPU network bandwidth from A3 VMs.

Google Cloud’s new Confidential VMs on A3 will also include support for confidential computing to help customers protect the confidentiality and integrity of their sensitive data and secure applications and AI workloads during training and inference — with no code changes while accessing H100 GPU acceleration. These GPU-powered Confidential VMs will be available in Preview this year.

Next Up: NVIDIA Blackwell-Based GPUs

NVIDIA’s newest GPUs based on the NVIDIA Blackwell platform will be coming to Google Cloud early next year in two variations: the NVIDIA HGX B200 and the NVIDIA GB200 NVL72.

The HGX B200 is designed for the most demanding AI, data analytics and high performance computing workloads, while the GB200 NVL72 is designed for next-frontier, massive-scale, trillion-parameter model training and real-time inferencing.

The NVIDIA GB200 NVL72 connects 36 Grace Blackwell Superchips, each with two NVIDIA Blackwell GPUs combined with an NVIDIA Grace CPU over a 900GB/s chip-to-chip interconnect, supporting up to 72 Blackwell GPUs in one NVIDIA NVLink domain and 130TB/s of bandwidth. It overcomes communication bottlenecks and acts as a single GPU, delivering 30x faster real-time LLM inference and 4x faster training compared to the prior generation.

NVIDIA GB200 NVL72 is a multi-node rack-scale system that will be combined with Google Cloud’s fourth generation of advanced liquid-cooling systems.

NVIDIA announced last month that NVIDIA DGX Cloud, an AI platform for enterprise developers that’s optimized for the demands of generative AI, is generally available on A3 VMs powered by H100 GPUs. DGX Cloud with GB200 NVL72 will also be available on Google Cloud in 2025.