NVIDIA Research Model Enables Dynamic Scene Reconstruction

Content streaming and engagement are entering a new dimension with QUEEN, an AI model by NVIDIA Research and the University of Maryland that makes it possible to stream free-viewpoint video, which lets viewers experience a 3D scene from any angle.

QUEEN could be used to build immersive streaming applications that teach skills like cooking, put sports fans on the field to watch their favorite teams play from any angle, or bring an extra level of depth to video conferencing in the workplace. It could also be used in industrial environments to help teleoperate robots in a warehouse or a manufacturing plant.

The model will be presented at NeurIPS, the annual conference for AI research that begins Tuesday, Dec. 10, in Vancouver.

“To stream free-viewpoint videos in near real time, we must simultaneously reconstruct and compress the 3D scene,” said Shalini De Mello, director of research and a distinguished research scientist at NVIDIA. “QUEEN balances factors including compression rate, visual quality, encoding time and rendering time to create an optimized pipeline that sets a new standard for visual quality and streamability.”

Reduce, Reuse and Recycle for Efficient Streaming

Free-viewpoint videos are typically created using video footage captured from different camera angles, like a multicamera film studio setup, a set of security cameras in a warehouse or a system of videoconferencing cameras in an office.

Prior AI methods for generating free-viewpoint videos either took too much memory for livestreaming or sacrificed visual quality for smaller file sizes. QUEEN balances both to deliver high-quality visuals — even in dynamic scenes featuring sparks, flames or furry animals — that can be easily transmitted from a host server to a client’s device. It also renders visuals faster than previous methods, supporting streaming use cases.

In most real-world environments, many elements of a scene stay static. In a video, that means a large share of pixels don’t change from one frame to another. To save computation time, QUEEN tracks and reuses renders of these static regions — focusing instead on reconstructing the content that changes over time.

Using an NVIDIA Tensor Core GPU, the researchers evaluated QUEEN’s performance on several benchmarks and found the model outperformed state-of-the-art methods for online free-viewpoint video on a range of metrics. Given 2D videos of the same scene captured from different angles, it typically takes under five seconds of training time to render free-viewpoint videos at around 350 frames per second.

This combination of speed and visual quality can support media broadcasts of concerts and sports games by offering immersive virtual reality experiences or instant replays of key moments in a competition.

In warehouse settings, robot operators could use QUEEN to better gauge depth when maneuvering physical objects. And in a videoconferencing application — such as the 3D videoconferencing demo shown at SIGGRAPH and NVIDIA GTC — it could help presenters demonstrate tasks like cooking or origami while letting viewers pick the visual angle that best supports their learning.

The code for QUEEN will soon be released as open source and shared on the project page.

QUEEN is one of over 50 NVIDIA-authored NeurIPS posters and papers that feature groundbreaking AI research with potential applications in fields including simulation, robotics and healthcare. Among them is a best paper runner-up, titled “Guiding a Diffusion Model With a Bad Version of Itself” — one of just five papers recognized among the over 4,500 accepted this year.

Generative Adversarial Nets, the paper that first introduced GAN models, won the NeurIPS 2024 Test of Time Award. Cited more than 85,000 times, the paper was coauthored by Bing Xu, distinguished engineer at NVIDIA. Hear more from its lead author, Ian Goodfellow, research scientist at DeepMind, on the AI Podcast:

The AI Podcast · Ep. 25: Google’s Ian Goodfellow on How an Argument in a Bar Led to Generative Adversarial Networks

Learn more about NVIDIA Research at NeurIPS.

See the latest work from NVIDIA Research, which has hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

Academic researchers working on large language models, simulation and modeling, edge AI and more can apply to the NVIDIA Academic Grant Program.

See notice regarding software product information.