ChatGPT marked the big bang moment of generative AI. Answers can be generated in response to nearly any query, helping transform digital work such as content creation, customer service, software development and business operations for knowledge workers.
Physical AI, the embodiment of artificial intelligence in humanoids, factories and other devices within industrial systems, has yet to experience its breakthrough moment.
This has held back industries such as transportation and mobility, manufacturing, logistics and robotics. But that’s about to change thanks to three computers bringing together advanced training, simulation and inference.
The Rise of Multimodal, Physical AI
For 60 years, “Software 1.0” — serial code written by human programmers — ran on general-purpose computers powered by CPUs.
Then, in 2012, Alex Krizhevsky, mentored by Ilya Sutskever and Geoffrey Hinton, won the ImageNet computer image recognition competition with AlexNet, a revolutionary deep learning model for image classification.
This marked the industry’s first contact with AI. The breakthrough of machine learning — neural networks running on GPUs — jump-started the era of Software 2.0.
Today, software writes software. The world’s computing workloads are shifting from general-purpose computing on CPUs to accelerated computing on GPUs, leaving Moore’s law far behind.
With generative AI, multimodal transformer and diffusion models have been trained to generate responses.
Large language models are one-dimensional, able to predict the next token, in modes like letters or words. Image- and video-generation models are two-dimensional, able to predict the next pixel.
None of these models can understand or interpret the three-dimensional world. And that’s where physical AI comes in.
Physical AI models can perceive, understand, interact with and navigate the physical world with generative AI. With accelerated computing, multimodal physical AI breakthroughs and large-scale physically based simulations are allowing the world to realize the value of physical AI through robots.
A robot is a system that can perceive, reason, plan, act and learn. Robots are often thought of as autonomous mobile robots (AMRs), manipulator arms or humanoids. But there are many more types of robotic embodiments.
In the near future, everything that moves, or that monitors things that move, will be autonomous robotic systems. These systems will be capable of sensing and responding to their environments.
Everything from surgical rooms to data centers, warehouses to factories, even traffic control systems or entire smart cities will transform from static, manually operated systems to autonomous, interactive systems embodied by physical AI.
The Next Frontier: Humanoids Robots
Humanoid robots are an ideal general-purpose robotic manifestation because they can operate efficiently in environments built for humans, while requiring minimal adjustments for deployment and operation.
The global market for humanoid robots is expected to reach $38 billion by 2035, a more than sixfold increase from the roughly $6 billion for the period forecast nearly two years ago, according to Goldman Sachs.
Researchers and developers around the world are racing to build this next wave of robots.
Three Computers to Develop Physical AI
To develop humanoid robots, three accelerated computer systems are required to handle physical AI and robot training, simulation and runtime. Two computing advancements are accelerating humanoid robot development: multimodal foundation models and scalable, physically based simulations of robots and their worlds.
Breakthroughs in generative AI are bringing 3D perception, control, skill planning and intelligence to robots. Robot simulation at scale lets developers refine, test and optimize robot skills in a virtual world that mimics the laws of physics — helping reduce real-world data acquisition costs and ensuring they can perform in safe, controlled settings.
NVIDIA has built three computers and accelerated development platforms to enable developers to create physical AI.
First, models are trained on a supercomputer. Developers can use NVIDIA NeMo on the NVIDIA DGX platform to train and fine-tune powerful foundation and generative AI models. They can also tap into NVIDIA Project GR00T, an initiative to develop general-purpose foundation models for humanoid robots to enable them to understand natural language and emulate movements by observing human actions.
Second, NVIDIA Omniverse, running on NVIDIA OVX servers, provides the development platform and simulation environment for testing and optimizing physical AI with application programming interfaces and frameworks like NVIDIA Isaac Sim.
Developers can use Isaac Sim to simulate and validate robot models, or generate massive amounts of physically-based synthetic data to bootstrap robot model training. Researchers and developers can also use NVIDIA Isaac Lab, an open-source robot learning framework that powers robot reinforcement learning and imitation learning, to help accelerate robot policy training and refinement.
Lastly, trained AI models are deployed to a runtime computer. NVIDIA Jetson Thor robotics computers are specifically designed for compact, on-board computing needs. An ensemble of models consisting of control policy, vision and language models composes the robot brain and is deployed on a power-efficient, on-board edge computing system.
Depending on their workflows and challenge areas, robot makers and foundation model developers can use as many of the accelerated computing platforms and systems as needed.
Building the Next Wave of Autonomous Facilities
Robotic facilities result from a culmination of all of these technologies.
Manufacturers like Foxconn or logistics companies like Amazon Robotics can orchestrate teams of autonomous robots to work alongside human workers and monitor factory operations through hundreds or thousands of sensors.
These autonomous warehouses, plants and factories will have digital twins. The digital twins are used for layout planning and optimization, operations simulation and, most importantly, robot fleet software-in-the-loop testing.
Built on Omniverse, “Mega” is a blueprint for factory digital twins that enables industrial enterprises to test and optimize their robot fleets in simulation before deploying them to physical factories. This helps ensure seamless integration, optimal performance and minimal disruption.
Mega lets developers populate their factory digital twins with virtual robots and their AI models, or the brains of the robots. Robots in the digital twin execute tasks by perceiving their environment, reasoning, planning their next motion and, finally, completing planned actions.
These actions are simulated in the digital environment by the world simulator in Omniverse, and the results are perceived by the robot brains through Omniverse sensor simulation.
With sensor simulations, the robot brains decide the next action, and the loop continues, all while Mega meticulously tracks the state and position of every element within the factory digital twin.
This advanced software-in-the-loop testing methodology enables industrial enterprises to simulate and validate changes within the safe confines of the Omniverse digital twin, helping them anticipate and mitigate potential issues to reduce risk and costs during real-world deployment.
Empowering the Developer Ecosystem With NVIDIA Technology
NVIDIA accelerates the work of the global ecosystem of robotics developers and robot foundation model builders with three computers.
Universal Robots, a Teradyne Robotics company, used NVIDIA Isaac Manipulator, Isaac accelerated libraries and AI models, and NVIDIA Jetson Orin to build UR AI Accelerator, a ready-to-use hardware and software toolkit that enables cobot developers to build applications, accelerate development and reduce the time to market of AI products.
RGo Robotics used NVIDIA Isaac Perceptor to help its wheel.me AMRs work everywhere, all the time, and make intelligent decisions by giving them human-like perception and visual-spatial information.
Humanoid robot makers including 1X Technologies, Agility Robotics, Apptronik, Boston Dynamics, Fourier, Galbot, Mentee, Sanctuary AI, Unitree Robotics and XPENG Robotics are adopting NVIDIA’s robotics development platform.
Boston Dynamics is using Isaac Sim and Isaac Lab to build quadrupeds and humanoid robots to augment human productivity, tackle labor shortages and prioritize safety in warehouses.
Fourier is tapping into Isaac Sim to train humanoid robots to operate in fields that demand high levels of interaction and adaptability, such as scientific research, healthcare and manufacturing.
Using Isaac Lab and Isaac Sim, Galbot advanced the development of a large-scale robotic dexterous grasp dataset called DexGraspNet that can be applied to different dexterous robotic hands, as well as a simulation environment for evaluating dexterous grasping models.
Field AI developed risk-bounded multitask and multipurpose foundation models for robots to safely operate in outdoor field environments, using the Isaac platform and Isaac Lab.
The era of physical AI is here — and it’s transforming the world’s heavy industries and robotics.
Get started with NVIDIA Robotics.