NVIDIA Speech AI Breakthrough Enables Enterprises to Create Unique Voices for Every Brand

NVIDIA Riva Custom Voice software to power virtual assistants, call-center voices and other speech-based applications; Riva Enterprise available for large-scale deployments.
by Kristin Uchiyama

NVIDIA announced a tool that enables the creation of custom, human-like voices in just a day, with only 30 minutes of audio data. 

NVIDIA Riva Custom Voice, a feature in NVIDIA Riva speech AI software, makes it practical for millions of companies to develop an expressive custom voice with Riva in hours versus weeks, using a small amount of data.

Companies can use Riva Custom Voice to create a virtual assistant with a unique voice. Call centers can use it to quickly develop a recognizable brand voice for serving customers. And developers can use it to create a wide variety of applications to support those with speech and language deficits.

“Human-like interactions have long been one of the greatest challenges of artificial intelligence, especially for companies with industry-specific jargon,” said Kari Briski, vice president of product management for AI software at NVIDIA. “Now these companies can use speech AI to listen and respond to customers with an expressive voice that’s unique to their brand and that drives more engaging and delightful interactions.”


Riva Custom Voice is available in the latest version of the NVIDIA Riva speech AI software development kit. The Riva SDK includes world-class automatic speech recognition and text-to-speech capabilities that are customizable to different accents and domains. It also provides the ability to scale speech services to hundreds of thousands of streams in the cloud, in the data center or at the edge.

Voice of the Ecosystem

In less than three years, NVIDIA’s conversational AI software has been downloaded more than 250,000 times, with broad adoption across a variety of industries. 

RingCentral, a leading provider of global enterprise cloud communications, video meetings, collaboration, and contact center solutions, is using Riva automatic speech recognition for its video conferencing live-captioning feature to create more engaging meeting experiences. 

“Our goal is to make meetings smarter and with NVIDIA Riva it’s now possible to train live transcription models on NVIDIA GPUs for accuracy against varied accents,” said Nat Natarajan, executive vice president and general manager of products and engineering at RingCentral. “In the future we expect there to be several concurrent streams and Riva can easily scale, running these streams in real time in under 300 milliseconds. We are excited to partner with NVIDIA and for the future.”

Ping An, one of the world’s largest financial services companies, is improving customer experiences by reducing wait times through its virtual agents. Using Riva allows it to build real-time speech applications that are constantly improving in accuracy. 

“Ping An addresses millions of customer queries per day using chatbot agents,” said Jing Xiao, chief scientist at Ping An. “Using NVIDIA’s pretrained models for automatic speech recognition, further fine-tuned on our data, our system has achieved a 5 percent improvement in accuracy, enabling us to provide more engaging and authentic services.”

Dozens of software-makers are also using NVIDIA conversational AI in production. Gosoft Contact Center is working with CP All, which has 20+ business domains. Its retail domain serves more than 10,000 7-11 convenience stores in Thailand. In total, 240,000 calls are handled per day through the help of highly accurate AI voicebots trained on the Thai language. 

And Plabook Education and Data Monsters are working with school districts across the U.S. to help children learn to read through its AI-powered digital avatar reading assistant that helps identify mispronounced words and measure their reading accuracy.

Availability and Pricing

For small-scale research and development, NVIDIA Riva is available at no cost on the NVIDIA NGC  container registry. Developers can join the Riva open beta program to try the software today and receive notifications about upcoming features.

For customers with large-scale deployments and looking for technical support from NVIDIA experts, NVIDIA also announced the NVIDIA Riva Enterprise program, which is expected to be available early next year.  

Riva at GTC

In his GTC keynote, NVIDIA founder and CEO Jensen Huang showcased Riva’s speech AI capabilities, including in a demo with Riva Custom Voice that highlighted how new, human-like voices can be created with just 30 minutes of data. 

Riva was also shown in Omniverse Avatar — a platform for creating interactive avatars — through Project Tokkio, DRIVE Concierge and Project Maxine. Project Tokkio and DRIVE Concierge showcased avatars in customer service and in-vehicle environments, while Project Maxine highlighted real-time translation and transcription into multiple languages.

At GTC, there are more than two dozen talks focused on conversational AI, including ones by Hugging Face, Snap, T-Mobile and more. Topics include state-of-the-art algorithms, tools, challenges and impact of developing and integrating GPU-accelerated speech and language AI applications.

Register for free to learn more about NVIDIA Riva during NVIDIA GTC, taking place online through Nov. 11. Watch NVIDIA founder and CEO Jensen Huang’s GTC keynote address below.