At CVPR, NVIDIA is unveiling new physical AI agent skills that help researchers and developers speed the development of autonomous vehicles, robots and vision AI systems.
The core challenge in physical AI research isn’t simply developing stronger models. It’s building a full workflow around them — reconstructing real-world scenes, generating edge-case scenarios, training policies, evaluating behavior and rapidly iterating. Today, these steps are fragmented across separate tools, slowing the pace of experimentation as researchers struggle to piece them together.
Earlier this week, NVIDIA announced NVIDIA Cosmos 3, the open frontier model for physical AI and the world’s first full omnimodel unifying vision reasoning, world and action generation. Leading across the open model public leaderboards central to physical AI, the world foundation model provides core capabilities for physical AI development. NVIDIA physical AI skills pair with Cosmos, NVIDIA libraries and simulation frameworks to help researchers move from model capabilities to scalable end-to-end workflows faster than ever.
Advancing Autonomous Vehicle Research Beyond Recorded Miles
For AV researchers, the problem is the “long tail” of driving — rare interactions, unusual road geometry, lighting changes and edge-case behaviors that are difficult to repeatedly collect, but critical for training and validation.
Neural Reconstruction skill demo in OpenClaw, showing a video re-rendered from an elevated virtual sensor viewpoint.
With NVIDIA autonomous vehicle skills, researchers and developers can task AI agents to automate workflows for scene reconstruction from fleet data and generate synthetic scenarios. Neural Reconstruction skills help AI agents turn fleet-captured data into editable 3D scenes for simulation and synthetic data generation, while technologies including NVIDIA Omniverse NuRec, InstantNuRec, Harmonizer and HiGS accelerated renderer help accelerate reconstruction, improve scene realism and generate new views.
InstantNuRec enables fast 3D Gaussian road-scene reconstruction from images without per-scene optimization.
For AV researchers, repeatable simulation helps vary conditions, compare system responses and uncover failure modes across scenarios beyond what can be captured in real-world data.
NVIDIA AlpaGym, an open source closed-loop reinforcement learning framework, extends that approach by connecting policy rollouts and high-fidelity simulation with agent skills, scaling across thousands of GPUs, to help researchers move through setup, rollout and evaluation. NVIDIA OmniDreams, an action-conditioned generative world model, adds photorealistic rendering to the simulation loop, generating camera frames that respond directly to policy actions in real time.
NVIDIA is also advancing AV research with its most powerful open driving foundation model to date: NVIDIA Alpamayo 2 Super, an open 32-billion-parameter reasoning vision language action (VLA) model that reasons, plans and acts across the full driving stack for safer, scalable level 4 development and deployment.
Advancing Vision AI Systems for the Real World
For vision AI research, the bottleneck is creating enough controlled examples to study how models behave when visual conditions, object states or temporal events change. Work in zero-shot anomaly detection, synthetic anomaly generation and few-shot defect recognition all run into the same data wall.
New skills for visual inspection generates multiple rare defects on different surfaces.
New NVIDIA Metropolis skills are helping researchers and developers use AI agents to generate synthetic visual scenarios, including anomalies, augment data and support pseudo-labeling. These skills benefit from Cosmos 3’s mixture-of-transformers architecture, which uses a reasoning transformer to analyze observations and feed instructions to a generation tower, helping scale physically grounded virtual worlds.
Researchers building high-accuracy visual inspection models can use the Defect Image Generation skill to create examples of different defects across different surfaces using real images. The workflow combines NVIDIA Isaac Sim for simulation, Cosmos 3 and NVIDIA OSMO for orchestration and vision language reasoning — letting researchers create rare visual cases and assess whether models respond correctly.
New NVIDIA Metropolis VSS Blueprint skills extract insights from massive volumes of video data.
For video AI agents, the NVIDIA Metropolis Blueprint for video search and summarization (VSS), NVIDIA TAO and Video Augmentation skills help extract insights from massive volumes of video data, fine-tune models and automate the build-and-evaluate loop. This gives researchers a more repeatable way to develop reasoning vision AI agents that can detect events, reason over complex scenes, summarize activity and send alerts.
Scaling Robot Learning With Agent-Ready Simulation Workflows
Teaching robots skills like navigating or manipulating comes down to iteration. For researchers, the bottleneck is building enough controlled environments and policy rollouts to understand how robot behavior changes across tasks, settings and embodiments — work that typically means stitching together simulation environments, task variations, policy training and evaluation by hand.
NVIDIA Isaac Sim 6.0 includes agent-friendly skills and connectors to help automate workflows.
With NVIDIA robotics skills, researchers can task AI agents to automate most common development steps across scene preparation, simulation and robot learning with NVIDIA Omniverse libraries, Isaac Sim and Isaac Lab frameworks. Agents can help launch simulation sessions, author scenes, control simulation, capture data and validate environments in Isaac Sim, while Isaac Lab skills support reinforcement learning setup, training, evaluation and custom environment development.
New NVIDIA Isaac mobility skills automate navigation workflows.
Specialized skills extend that workflow to mobility and manipulation. Isaac mobility skills support navigation workflows spanning scene search, USD conversion, environment registration, residual reinforcement learning and policy evaluation, while specialized Isaac Lab agentic workflows help with sim-to-sim and sim-to-real tasks such as environment building, physics tuning, debugging and profiling.
For healthcare robotics, Cosmos-H-Surgical-Simulator advances research by generating realistic surgical robotics data for policy training and evaluation. By learning directly from real surgical data rather than hand-engineered physics models, it helps reduce the sim-to-real gap, supporting the development of autonomous surgical tasks.
Cosmos 3 can further help generate synthetic data and scene variations, then support post-training with embodiment-specific behavior and environment data for tasks ranging from pick-and-place to dexterous manipulation.
NVIDIA Research at CVPR
NVIDIA technologies — including GPUs, open models, simulation frameworks and CUDA-accelerated libraries — were referenced in the majority of accepted CVPR 2026 papers, with adoption across leading global research labs and institutions including Carnegie Mellon University, Stanford University, UC Berkeley, Tsinghua University and Peking University.
NVIDIA researchers are presenting work across computer vision, physical AI, autonomous systems, neural rendering, generative AI and robotics at CVPR, running June 3-7 in Denver.
NVIDIA’s CVPR presence also includes open research challenges that help benchmark progress in physical AI:
- The AI City Challenge, a premier computer vision competition for smart city applications now in its tenth year.
- The PAI-AV Reasoning Challenge, a new open benchmark evaluating how well VLA models explain driving decisions using chain-of-causation labels.
- The AlpaSim Closed-Loop End-to-End Driving Challenge, a new open benchmark testing autonomous driving policies in closed-loop simulation on real-world reconstructed scenarios.
Grid of samples videos from new Robot Sim Dataset as a part of Cosmos 3 dataset release.
NVIDIA is also expanding the research infrastructure behind physical AI with datasets for training, fine-tuning and evaluation. The NVIDIA Physical AI Dataset has surpassed 15 million+ downloads on Hugging Face, while NVIDIA Isaac GR00T X Embodiment Sim has become one of the most-downloaded robotics datasets. New dataset releases include GRAIL, including roughly 50 hours of humanoid-object interaction data, and six synthetic video datasets used to train Cosmos 3 across robotics, physics, digital humans, autonomous driving, warehouse safety and spatial reasoning.
Availability
NVIDIA physical AI agent tools and skills are now openly available through GitHub.
Agent skills and tools for synthetic data generation — Neural Reconstruction, Video Augmentation, Defect Image Generation — are also available to try instantly on NVIDIA Brev as Physical AI Launchables, preconfigured environments that bundle agent skills and tools for faster synthetic data generation and evaluation. Launchables run on hosted NVIDIA H100 Tensor Core GPUs and include free trial credits for researchers.
Learn more about NVIDIA at CVPR and explore NVIDIA Research’s work in physical AI, computer vision and autonomous systems. Get started with Isaac GR00T and NVIDIA robotics tools.



