Fasten your seatbelts. NVIDIA Research is revving up a new deep learning engine that creates 3D object models from standard 2D images — and can bring iconic cars like the Knight Rider’s AI-powered KITT to life — in NVIDIA Omniverse.
Developed by the NVIDIA AI Research Lab in Toronto, the GANverse3D application inflates flat images into realistic 3D models that can be visualized and controlled in virtual environments. This capability could help architects, creators, game developers and designers easily add new objects to their mockups without needing expertise in 3D modeling, or a large budget to spend on renderings.
A single photo of a car, for example, could be turned into a 3D model that can drive around a virtual scene, complete with realistic headlights, tail lights and blinkers.
To generate a dataset for training, the researchers harnessed a generative adversarial network, or GAN, to synthesize images depicting the same object from multiple viewpoints — like a photographer who walks around a parked vehicle, taking shots from different angles. These multi-view images were plugged into a rendering framework for inverse graphics, the process of inferring 3D mesh models from 2D images.
Once trained on multi-view images, GANverse3D needs only a single 2D image to predict a 3D mesh model. This model can be used with a 3D neural renderer that gives developers control to customize objects and swap out backgrounds.
When imported as an extension in the NVIDIA Omniverse platform and run on NVIDIA RTX GPUs, GANverse3D can be used to recreate any 2D image into 3D — like the beloved crime-fighting car KITT, from the popular 1980s Knight Rider TV show.
Previous models for inverse graphics have relied on 3D shapes as training data.
Instead, with no aid from 3D assets, “We turned a GAN model into a very efficient data generator so we can create 3D objects from any 2D image on the web,” said Wenzheng Chen, research scientist at NVIDIA and lead author on the project.
“Because we trained on real images instead of the typical pipeline, which relies on synthetic data, the AI model generalizes better to real-world applications,” said NVIDIA researcher Jun Gao, an author on the project.
The research behind GANverse3D will be presented at two upcoming conferences: the International Conference on Learning Representations in May, and the Conference on Computer Vision and Pattern Recognition, in June.
From Flat Tire to Racing KITT
Creators in gaming, architecture and design rely on virtual environments like the NVIDIA Omniverse simulation and collaboration platform to test out new ideas and visualize prototypes before creating their final products. With Omniverse Connectors, developers can use their preferred 3D applications in Omniverse to simulate complex virtual worlds with real-time ray tracing.
But not every creator has the time and resources to create 3D models of every object they sketch. The cost of capturing the number of multi-view images necessary to render a showroom’s worth of cars, or a street’s worth of buildings, can be prohibitive.
That’s where a trained GANverse3D application can be used to convert standard images of a car, a building or even a horse into a 3D figure that can be customized and animated in Omniverse.
To recreate KITT, the researchers simply fed the trained model an image of the car, letting GANverse3D predict a corresponding 3D textured mesh, as well as different parts of the vehicle such as wheels and headlights. They then used NVIDIA Omniverse Kit and NVIDIA PhysX tools to convert the predicted texture into high-quality materials that give KITT a more realistic look and feel, and placed it in a dynamic driving sequence.
“Omniverse allows researchers to bring exciting, cutting-edge research directly to creators and end users,” said Jean-Francois Lafleche, deep learning engineer at NVIDIA. “Offering GANverse3D as an extension in Omniverse will help artists create richer virtual worlds for game development, city planning or even training new machine learning models.”
GANs Power a Dimensional Shift
Because real-world datasets that capture the same object from different angles are rare, most AI tools that convert images from 2D to 3D are trained using synthetic 3D datasets like ShapeNet.
To obtain multi-view images from real-world data — like images of cars available publicly on the web — the NVIDIA researchers instead turned to a GAN model, manipulating its neural network layers to turn it into a data generator.
The team found that opening the first four layers of the neural network and freezing the remaining 12 caused the GAN to render images of the same object from different viewpoints.
Keeping the first four layers frozen and the other 12 layers variable caused the neural network to generate different images from the same viewpoint. By manually assigning standard viewpoints, with vehicles pictured at a specific elevation and camera distance, the researchers could rapidly generate a multi-view dataset from individual 2D images.
The final model, trained on 55,000 car images generated by the GAN, outperformed an inverse graphics network trained on the popular Pascal3D dataset.
Read the full ICLR paper, authored by Wenzheng Chen, fellow NVIDIA researchers Jun Gao and Huan Ling, Sanja Fidler, director of NVIDIA’s Toronto research lab, University of Waterloo student Yuxuan Zhang, Stanford student Yinan Zhang and MIT professor Antonio Torralba. Additional collaborators on the CVPR paper include Jean-Francois Lafleche, NVIDIA researcher Kangxue Yin and Adela Barriuso.
The NVIDIA Research team consists of more than 200 scientists around the globe, focusing on areas such as AI, computer vision, self-driving cars, robotics and graphics. Learn more about the company’s latest research and industry breakthroughs in NVIDIA CEO Jensen Huang’s keynote address at this week’s GPU Technology Conference.
GTC registration is free, and open through April 23. Attendees will have access to on-demand content through May 11.
Knight Rider content courtesy of Universal Studios Licensing LLC.