Tongues Untied: Dataset Starts Global Dialogue in Conversational AI

A startup is automating COVID-19 information services in East Africa thanks to Mozilla Common Voice, a free public dataset supported by NVIDIA.
by Jane Polak Scowcroft

A startup in East Africa is harnessing conversational AI to get the word out about a third wave of COVID-19 passing through the region. It hopes its Mbaza AI Chatbot will lead to partnerships that use the technology to tackle other concerns across the continent’s many languages.

“COVID is here to stay, unfortunately, and it’s a volatile topic with measures that tighten and loosen from week to week, so it’s important for people to have access to the latest information,” said Audace Niyonkuru, founder and CEO of Digital Umuganda, the startup developing the software.

Based in Rwanda’s capital of Kigali, his team aims to deploy a basic voice service in August. It will follow up with a version by year’s end that can interpret and respond to spoken questions.

Conversational AI Gets the Word Out

“Ours is a more oral culture where there are still barriers to access because it’s easier for people to talk than write,” Niyonkuru said of the primarily rural country where three-quarters of the 12 million population are literate.

It’s a challenge shared widely across Africa, home to more than 2,000 languages and dialects. But Niyonkuru, a lifelong entrepreneur, prefers to see the glass as half full.

“There’s a huge opportunity globally because conversational AI is a bridge over barriers to access — people can use their phones to get all sorts of medical or legal information,” he said.

Giving AI a Common Voice

To train a conversational AI model, you need an extremely large dataset of voice samples, something that takes lots of time to build or lots of money to buy. The startup trained its models on Mozilla Common Voice, a free and publicly available multilingual platform and dataset created by Mozilla and supported by NVIDIA. The Common Voice dataset was built through contributions from thousands of contributors across the world.

Digital Umuganda is Africa’s largest contributor to the platform. To date, it’s organized contributors to create 2,200 hours of Kinyarwanda, the language spoken by 40 million people in and around Rwanda. It’s the largest dataset after English in Common Voice today.

To create the dataset, the startup tapped into Rwanda’s tradition where neighbors gather on the last Saturday of each month to work on a community project. The startup embraced and extended the practice called umuganda.

“The spirit of open source software is embedded in Rwanda’s culture, so we just applied it to the digital world and datasets,” he said.

Donations Shared with All

Digital Umuganda started collecting data with student gatherings at universities, then went to the countryside to make sure the dataset represented people of all ages.

“The beautiful thing is because it’s in the open we see researchers around the world working with it,” said Niyonkuru.

Two branches of the Rwandan government have expressed interest in using the startup’s technology, and at least one third party has already created a conversational AI model using the dataset.

The COVID project got its start last spring when government call centers were overwhelmed by peaks of more than 10,000 calls for information about the pandemic. The Mbaza chatbot will be deployed on existing government healthcare lines as a 24/7 information service.

It’s one example of how Common Voice is democratizing access to conversational AI around the globe, both for companies that develop the technology and consumers who use it.

Giving More Languages a Voice

First launched in 2017, the Common Voice dataset gets an updated release twice a year. It focuses on expanding support in underrepresented languages, filling wide gaps left by commercial voice projects that typically focus on a handful of the most popular American, Asian and European languages.

Common Voice currently packs more than 10,000 hours of recorded voice samples, collected and validated by volunteers. It’s a treasure trove for startups, researchers and small- to medium-sized developers who don’t have the time or money to collect or purchase datasets of their own.

The next release, coming at the end of July, provides data from 75 languages, 15 of them debuting in Common Voice for the first time. They include Urdu, spoken by 70 million people in south Asia; Hausa, the language of 60 million Africans; as well as Azerbaijani, Armenian, Serbian and Uighur — none of which are supported by major commercial AI services.

It will be the first release since NVIDIA became a partner with Mozilla in April 2021, supporting Common Voice as part of a shared vision of making conversational AI available for everyone.

How You Can Help

We created the NVIDIA Riva framework to give developers state-of-the-art pre-trained deep learning models and software tools to create interactive conversational AI services. Now we’re helping make this rich, open dataset available, too.

Everyone is invited to join the global effort to make this technology available to all developers in all languages by going to Common Voice and contributing or validating voice samples as part of a dataset anyone can use freely.

Above: Digital Umuganda co-founder Ali Nyiringabo (right) with volunteers at an event in Kigali collecting and validating samples for Common Voice.