SoundHound Digs Deeper Into Voice AI Market

by Scott Martin

SoundHound is learning some new AI tricks.

The Silicon Valley startup, which creates AI-based voice services, has fetched $100 million in strategic investment capital as it expands its offerings.

In addition to its eponymous music recognition app, SoundHound offers its Hound voice search app and Houndify voice platform for companies to create AI-powered voice services. The company’s tech has become the de facto alternative for voice search in a market crowded with the industry’s biggest players.

SoundHound is the underdog choice versus the likes of Amazon, Apple, Google and Microsoft.

The company is pushing out its voice domains, or topics for natural language processing fluency, at a rapid pace. It has gone from 50 domains to 200 such areas in which it’s improving voice services in a span of two years, outpacing the advances of Apple’s Siri.

NVIDIA GPU Ventures, which backs startups working on deep learning, is an early investor in  SoundHound.

Join in the Collective

Meanwhile, SoundHound continues to push for interoperability — or the ability for domains to speak to one another — as a leg up in providing better search capabilities for consumers. The company, which calls this effort Collective AI, says this makes products using the architecture  smarter and more capable.

Collective AI is intended to enable people to ask complicated queries and get responses, such as this: Find the best Italian restaurant in San Francisco that has more than 4 stars, is good for kids, isn’t a chain and is open after 9 p.m. on Wednesdays.

The company’s Collective AI alliance includes NVIDIA, Yelp,, Sportstrader, Xignite, FlightStats, Onkyo, Sharp, Uber and Samsung ARTIK.

SoundHound also aims to stand out from the pack with Houndify. The white-label licensed service allows companies to personalize voice assistants with their own name in products and keep the customer data that’s generated. This enables companies to build their voice search brand and tap into other business opportunities that can emerge from customer data.

Amazon, by contrast, licenses Alexa and customers must call it “Alexa” in queries and the delivery giant owns the customer data. Apple doesn’t license its Siri voice assistant and Google doesn’t allow people to customize its Google Assistant name or own the data created by their customers.

Houndify Developers Triple

Developers are biting for Houndify. Early last year SoundHound had more than 20,000 developers registered to use Houndify, a number that has now swelled to more than 60,000.

SoundHound is retrieving customers for Houndify, too. Today the company is working with 11 automakers as well as companies pursuing robotics, connected speakers, appliances, augmented reality and smart home devices using Houndify.

Hyundai is implementing Houndify for next-generation voice in future cars. The automaker’s proactive assistant is designed to predict driver needs for information, such as providing meeting reminders. It also enables hands-free phone calls, texting, destination and music search, as well as the ability to check the weather and manage a calendar. Voice will also extend to control of air conditioning, door locks and other vehicle functions.

The NVIDIA DRIVE and Jetson TX2 platforms help make SoundHound’s speech-to-meaning technology possible in automotive and robotics applications, respectively.

Jetson TX2 Development Kit
The Jetson TX2 developer kit for robotics

Dual Approach to Speech Recognition

SoundHound has taken a novel approach to serve up speech recognition on the fly. It has been granted a patent for its system that applies a dual method in which both its local recognition model and remote recognition engine perform speech recognition. SoundHound’s hybrid engineering takes advantage of GPUs from NVIDIA Drive to serve up faster processing of voice queries.

The dual approach from SoundHound has enabled real-time responses to voice queries in vehicles, a game changer in an industry whose legacy voice systems are frustratingly slow.

This type of ingenuity is what can make AI available on the edge of the network. Historically, embedded technologies have only been able to recognize a small set of vocabulary and at lower speed and accuracy. SoundHound, however, unleashed NVIDIA GPUs to run a large vocabulary for speech recognition and natural language understanding at high speed and accuracy.

“We use the NVIDIA DRIVE platform to create an embedded version of our system that can scale to more than a million words in natural language,” said SoundHound co-founder and CEO Keyvan Mohajer. “It’s very fast and scalable.”

In robotics, Mayfield Robotics is developing its Kuri robot for use with Houndify for voice interactions, allowing people to interact with and guide the robot.

For appliances, Bunn has shown a reference model using Houndify on its Sure Immersion Coffee Machine, which is brought to life with the prompt, “OK, barista.” Customers can use voice commands to operate the coffee-making part of the machine as well as to search for weather, sports and other information while waiting for coffee to brew.

SoundHound uses NVIDIA GPUs for training neural networks and deep learning, and it operates its own data centers running GPUs. The company’s natural language processing runs on thousands of servers and the company works with terabytes of data.

“Something that might take many months, now takes days, and that’s thanks to the GPUs,” Mohajer said. “The industry can’t move without GPUs.”