Alexa, play @%!^&!!
Voice assistants have a long way to go still, but that’s not slowing an AI-based speech recognition startup’s ambitions to be the de facto meeting notes assistant, capturing voice-to-text interactions.
Silicon Valley-based AISense has launched Otter, a GPU-powered app that records speech and quickly returns voice files and transcriptions noted from multiple people. Otter is available now for free on iOS, Android and the web.
Founded in 2016, the startup has focused on speech recognition technologies for long-form conversations among multiple speakers, as well as on a language processing area known as speaker diarization, which enables machines to differentiate voices.
Two years in the making, AISense’s proprietary Ambient Voice Intelligence technology allows people to store, search, share and analyze voice conversations.
Otter allows you to scroll text, clearly labeled coming from multiple people, and gives the option to listen, as well. The app provides better than 90 percent accuracy in text dictation, according to the company.
Human-to-human interactions are much more difficult to capture than human-to-machine interactions such as simple commands between people and Amazon’s Alexa, Apple’s Siri or Google Assistant, according to AISense co-founder and CEO Sam Liang. That’s what makes Otter different from traditional voice products, which only handle short queries or commands from a single speaker.
AISense technology had to be enhanced to handle all of the complicated interactions of people and nuances of conversations, and it can get tripped up by accents in people’s speech, said Liang.
“This is a pretty deep technology. It’s extremely difficult,” he said. “We had to do pretty sophisticated supervised learning, and we had to get a lot of labeled data, with hundreds of thousands of hours of recordings.”
Liang is a well-known Silicon Valley tech figure. At Google Maps, he was responsible for the blue dot as the tech lead of location services. In 2013, he sold his startup Alohar Mobile to Alibaba.
His latests startup is also building a semi-supervised learning system to do self-learning from large quantities of meeting data without requiring human transcription.
Luckily, there’s a ton of such training data available online. The 15-person team at AISense was able to get freely available data from archives of NPR radio programs and Supreme Court proceedings available at the Library of Congress.
Then it used terabytes of audio data and transcripts to train its algorithms for Otter, relying on 50 NVIDIA Tesla GPUs. Said Liang, “It’s a startup and we have to spend money very frugally. But we have to spend some resources on GPUs — it’s just a must.”
The company is targeting Otter at enterprise customers who might use it in meetings. AISense plans to release a premium version that will require a subscription, and it already licenses some of its technology to enterprise customers.
AISense recently partnered with Zoom Video Communications to handle gobs of voice data, which is being robo-transcribed by AISense’s technology.
AISense has raised more than $13 million in funding to date. Investors include Horizon Ventures, Draper Associates, Slow Ventures, SV Tech Ventures, Bridgewater Associates, 500 Startups and billionaire Stanford professor David Cheriton.