NVIDIA Maxine Reinvents Real-Time Communication With AI

Everyone wants to be heard. And with more people than ever in video calls or live streaming from their home offices, rich audio free from echo hiccups and background noises like barking dogs is key to better sounding online experiences.

NVIDIA Maxine offers GPU-accelerated, AI-enabled software development kits to help developers build scalable, low-latency audio and video effects pipelines that improve call quality and user experience.

Today, NVIDIA announced at GTC that Maxine is adding acoustic echo cancellation and AI-based upsampling for better sound quality.

Acoustic Echo Cancellation eliminates acoustic echo from the audio stream in real time, preserving speech quality even during double-talk. With AI-based technology, Maxine achieves more effective echo cancellation than that achieved via traditional digital signal processing algorithms.

Audio Super Resolution improves the quality of a low-bandwidth audio signal by restoring the energy lost in higher frequency bands using AI-based techniques. Maxine Audio Super Resolution supports upsampling the audio from 8 kHz (narrowband) to 16 kHz (wideband), from 16 kHz to 48 kHz (ultra-wideband) and from 8 kHz to 48 kHz. Lower sampling rates such as 8 kHz often result in muffled voices and emphasize artifacts such as sibilance and make the speech difficult to understand.

Modern film and television studios often use 48 kHz (or higher) sampling rate for recording audio, in order to maintain fidelity of the original signal and preserve clarity. Audio Super Resolution can help restore the fidelity of old audio recordings, derived from magnetic tapes or other low bandwidth media.

Bridging the Sound Gap

Most modern telecommunication takes place using wideband or ultra-wideband audio. Since NVIDIA Audio Super Resolution can upsample and restore the narrowband audio in real-time, the technology can effectively be used to bridge the quality gap between traditional copper wire phone lines and modern VoIP-based wideband communication systems.

Real-time communication — whether for conference calls, call centers or live streaming of all kinds — is taking a big leap forward with Maxine.

Since its initial release, Maxine has been adopted by many of the world’s leading providers for video communications, content creation and live streaming.

The worldwide market for video conferencing is expected to increase to nearly $13 billion in 2028, up from about $6.3 billion in 2021, according to Fortune Business Insights.

WFH: A Way of Life

The move to work from home, or WFH, has become an accepted norm across companies, and organizations are adapting to the new expectations.

Analyst firm Gartner estimates that only a quarter of meetings for enterprises will be in person in 2024, a decline from 60 percent pre-pandemic.

Virtual collaboration in the U.S. has played an important role as people have taken on hybrid and remote positions in the past two years amid the pandemic.

But as organizations seek to maintain company culture and workplace experience, the stakes have risen for higher-quality media interactivity.

Solving the Cocktail Party Problem

But sometimes work and home life collide. As a result, meetings are often filled with background noises from kids, construction work outside or emergency vehicle sirens, causing brief interruptions in the flow of conference calls.

Maxine helps solve an age-old audio issue known as the cocktail party problem. With AI, it can filter out unwanted background noises, allowing users to be better heard, whether they’re in a home office or on the road.

The Maxine GPU-accelerated platform provides an end-to-end deep learning pipeline that integrates with customizable state-of-the-art models, enabling high-quality features with a standard microphone and camera.

Sound Like Your Best Self

In addition to being impacted by background noise, audio quality in virtual activities can sometimes sound thin, missing low- and mid-level frequencies, or even be barely audible.

Maxine enables upsampling of audio in real time so that voices sound fuller, deeper and more audible.

Logitech: Better Audio for Headsets and Blue Yeti Microphones

Logitech, a leading maker of peripherals, is implementing Maxine for better interactions with its popular headsets and microphones.

Tapping into AI libraries, Logitech has integrated Maxine directly inside G Hub audio drivers to enhance communications with its devices without the need for additional software. Maxine takes advantage of the powerful Tensor Cores in NVIDIA RTX GPUs so consumers can enjoy real-time processing of their mic signal.

Logitech is now utilizing Maxine’s state-of-the-art denoising in its G Hub software. That has allowed it to remove echoes and background noises — such as fans, as well as keyboard and mouse clicks — that can distract from video conferences or live-streaming sessions.

“NVIDIA Maxine makes it fast and easy for Logitech G gamers to clean up their mic signal and eliminate unwanted background noises in a single click.” said Ujesh Desai, GM of Logitech G. “You can even use G HUB to test your mic signal to make sure you have your Maxine settings dialed in.”

Tencent Cloud Boosts Content Creators

Tencent Cloud is helping content creators with their productions by offering technology from NVIDIA Maxine that makes it quick and easy to add creative backgrounds.

NVIDIA Maxine’s AI Green Screen feature enables users to create a more immersive presence with high-quality foreground and background separation — without the need for a traditional green screen. Once the real background is separated, it can easily be replaced with a virtual background, or blurred to create a depth-of-field effect. Tencent Cloud is offering this new capability as a software-as-a-service package for content creators.

NVIDIA Maxine’s AI Green Screen technology helps content creators with their productions by enabling more immersive high quality experiences, without the need for specialized equipment and lighting” said Director of the Product Center, Vulture Li at Tencent Cloud audio and video platform.

Making Virtual Experiences Better

NVIDIA Maxine provides state-of-the-art real-time AI audio, video and augmented reality features that can be built into customizable, end-to-end deep learning pipelines.

The AI-powered SDKs from Maxine help developers to create applications that include audio and image denoising, super resolution, gaze correction, 3D body pose estimation and translation features.

Maxine also enables real-time voice-to-text translation for a growing number of languages. At GTC, NVIDIA demonstrated Maxine translating between English, French, German and Spanish.

These effects will allow millions of people to enjoy high-quality and engaging live-streaming video across any device.