AI Factories: The New Infrastructure of Intelligence

AI factories are a new class of infrastructure built to manufacture intelligence that’s always on and in real time. In the industrial age, power plants converted energy into electricity. In the AI age, AI factories convert energy into tokens — the unit of production for reasoning models, agents and intelligent systems.

Their economics are defined by what they produce: tokens per second, tokens per watt, cost per token, utilization and uptime. In this model, performance per watt translates directly into revenue. Cost per token impacts the economics of every AI factory.

AI is no longer simply software. It’s essential infrastructure.

AI factories turn massive-scale infrastructure into continuous intelligence production.

AI factories synchronize massive compute resources while serving billions of requests. Software-orchestrated and comprised of autonomous, multi-agent systems that run continuously, they produce intelligence around the clock. Agentic systems reason and plan with the best-performing AI models, proprietary and open, including NVIDIA Nemotron. Open models can be customized for enterprises’ domain-specific needs, optimized and securely deployed — all on AI factories.

Operating in production today, AI factories are optimized across the entire stack — including models, compute, networking, memory, software, storage, power and cooling — to keep intelligence in continuous output.

Agentic AI generates synthetic training data, creating scenarios that help autonomous systems learn from the next edge case.

Agentic AI Changes the Workload

AI factories are built for a new kind of workload: always-on inference that does more than answer a prompt. Autonomous agents reason, plan, search, use tools, retrieve data, write code and take action. They create their own sub-agents that learn how to use domain-specific tools and develop their own AI skills. These multi-agent systems make AI workloads longer, deeper and far more compute-intensive. This also changes what the infrastructure must do. Performance depends on keeping the entire workflow moving efficiently so intelligence stays in production for the next step, the next action and the next decision.

Autonomous Agents Reshape the Architecture

Autonomous agents depend on accelerated compute paired with fast memory, storage for context, networking for coordination, software for orchestration and CPUs for execution. The workload moves across the stack, often with tight latency requirements at every step. AI factories comprise full-stack systems designed to keep those workflows moving continuously with the throughput, responsiveness and utilization needed to produce tokens efficiently at scale.

AI Factories Rely on Extreme Codesign

Hardware, networking, memory, storage and software are architected together with continuous optimization at every layer to increase utilization, lower cost per token and raise output. They balance responsiveness for always-on, interactive AI workloads with the throughput needed to maximize production.

Inference Is a Real-Time Orchestration Challenge

As AI workflows grow longer and more interactive, the factory has to run in real time. That means routing requests, managing memory, coordinating services, balancing latency and throughput, and keeping utilization high across the stack. In AI factories, the software layer is critical because the ability to run the factory efficiently determines how much intelligence it produces and how much value it creates. Inference has become a live orchestration challenge that spans the full machine.

But operating an AI factory efficiently starts long before the system goes live. The same full-stack codesign required for inference also changes how AI factories are planned, validated and brought online.

In AI compute, performance per watt has become the ultimate measure of competitiveness for AI factories. Data centers once stored files. Now, AI factories produce tokens. For producers of AI, that output directly affects revenue. For enterprises, cost per token determines whether they can profitably scale AI.

SemiAnalysis InferenceX benchmarks quantify this shift in real-world terms. The NVIDIA Blackwell Ultra GPU delivers the lowest cost per token, allowing AI factories to produce more intelligence from the same power envelope at a lower unit cost. More tokens per watt means greater throughput per unit of infrastructure cost, space or power. Lower cost per token improves the economics of inference at scale.

NVIDIA GB300 NVL72 systems generate 50x more tokens per megawatt than the prior generation, resulting in 35x lower cost per token compared with the NVIDIA Hopper platform.

AI factories built with NVIDIA Blackwell Ultra deliver up to 50x higher throughput per megawatt, leading to 35x lower cost per token — balancing performance, responsiveness and energy efficiency at scale. The NVIDIA Dynamo framework helps orchestrate long-context reasoning and massive inference throughput, keeping utilization high as workloads become more interactive and complex. Together, they show how AI factory performance is now measured: by how efficiently a factory can produce intelligence in real time.

The NVIDIA Vera Rubin platform extends that curve again. As reasoning and agentic AI continue to scale, Vera Rubin-based systems are designed to push performance per watt up to 35x higher with LPX and drive token cost lower through deeper full-stack optimization. The result is more efficient intelligence production at the factory level.

The NVIDIA Vera Rubin platform.

From Chips to Full-Stack AI Factories

What began with GPUs has expanded into full-stack AI factories comprising accelerated compute, high-speed interconnects, liquid-cooled systems, inference software, autonomous agents, reference architectures and the ecosystem needed to build and operate them at scale.

Full-stack AI factories are part of the broader ecosystem that NVIDIA is helping define and build. NVIDIA closely collaborates with global system partners such as Cisco, Dell, HPE, Lenovo and Supermicro to bring AI infrastructure to enterprise data centers. NVIDIA also relies on a curated ecosystem of AI software partners to build AI solutions for each enterprise’s use cases. This ecosystem supports a choice of models, across proprietary and open options.

These AI factories can be deployed for a wide range of use cases, from agentic AI workloads to physical AI and robotics. Every organization in every industry — from financial services and life sciences to manufacturing and the public sector — will need to build or rent an AI factory.

NVIDIA runs its own enterprise AI factory to accelerate development across the company, with hundreds of autonomous AI agents assisting engineering, software and operations teams. It’s a practical proof point: AI factories can transform how companies build, design and operate. They can increase productivity inside the enterprise, turning AI from an occasional tool into a capability woven directly into daily work.

AI factories can start small to support one business unit or workload, or they may be built from the ground up to support high-performance AI inference and training at massive scale. NVIDIA DSX reference designs unify design, simulation, operations and ecosystem technologies to build gigawatt-scale AI factories at the lowest token cost per megawatt.

Building these gigawatt-scale AI factories requires a lot more than optimized compute. It requires a shared digital environment where facility design, hardware systems, power, cooling and operations can be modeled together before build-out and continuously improved after deployment. The NVIDIA Omniverse DSX Blueprint supports this workflow with digital twins that connect facilities, hardware and software, using Omniverse, OpenUSD and SimReady assets to help partners validate designs and optimize operations across the AI factory lifecycle.

A full-stack approach helps organizations extract more intelligence from every system, turning AI infrastructure into an autonomous, always-on engine of reasoning, action and insight. The last industrial revolution converted energy into work. This one converts energy into intelligence. AI factories are the infrastructure of this new era, built to power the next wave of economic growth.

Learn more about how AI factories are the industrial infrastructure of the AI era. Watch NVIDIA founder and CEO Jensen Huang’s keynote at NVIDIA GTC Taipei at COMPUTEX — Monday, June 1, at 11 a.m. Taipei time.