5 Shockingly Advanced AI Apps Showcased at CVPR

From a flying cameraman to a mobile app that helps the blind, see: Inception startups showcase NVIDIA-powered technologies at a recent top AI conference.
by Kristin Bryson

A personal, flying cameraman. A computer that can read your mood. Facial recognition software powerful enough to catch criminals. Custom maps smart enough to help you find what you really want. An app that can help the blind, see.

None of this is sci-fi. All these apps are real. And they’re just five of the technologies on display by the startups in our Inception Program at this month’s Conference on Computer Vision and Pattern Recognition, in Honolulu.

In not much more than a year, NVIDIA’s Inception program has built a roster of over 1,300 startups working on such AI applications. Some 30 were at CVPR, nearly two dozen with exhibition booths, including these five:

Skydio and the “Awesome Power of Flight”

Anyone who’s “flown” a drone knows how tricky it is. It gets trickier still if the goal is to shoot video of someone in motion.

To solve this problem, Skydio is developing an autonomous drone that gets directions from a mobile application and then flies behind its subject, taking footage from the best angles. In this way, outdoor buffs can inject the videos they shoot of themselves with “the awesome power of flight,” said Abraham Bachrach, co-founder and CTO.

“With the technology that we’re bringing to the table, it allows the vehicle to understand the world,” Bachrach said. “Until the vehicle can really understand and perceive the world around it, you’re stuck being the pilot.”

Founded three years ago in Silicon Valley, Skydio employs 50 people and has received $28 million in venture funding, led by Andreessen Horowitz. The company, which is still developing its product and go-to-market strategy, plans to focus first on the consumer market. But Bachrach said Skydio will eventually move into selling its drones to inspect hard-to-access infrastructure.

WRNCH: All But a Wet Nose – Bringing Canine Qualities to Computers

Paul Kruszewski, CEO of Inception partner WRNCH, has a simple objective: He wants computers to act more like dogs.

His rationale? With human communication dominated by non-verbal cues, Kruszewski looked to dogs as his model. That’s because they’re so skilled at reading human body language.

“Ultimately, if we can give machines these eyes, and we can get them to understand our intentionality, we’re going to build this very interesting world,” he said.

Kruszewski and his team have been building their Body Slam product, which extracts 3D representations of people from 2D video by tracking 23 key features and articulation points on the body. To do this, the Montreal-based startup uses a range of NVIDIA technology, including GPUs, CUDA and cuDNN. He sees Body Slam being useful in a number of settings, from intelligent assistants for the elderly to in-vehicle monitoring and entertainment applications.

“Wherever people are using GPUs, that’s where our opportunities are,” he said.

Sensetime: Catching Bad Guys With Computer Vision

Sensetime’s facial recognition software has huge potential, sure. But it’s already got a track record that would put many ambitious crime fighters to shame.

Over the past six months, it helped Chinese law enforcement catch 40 criminals by matching faces from public surveillance cameras against criminal databases. And that’s with deployment in only two precincts.

The two-year-old d company, which sells its software to police departments and public transit companies has now signed up 40 precincts. And Junjie Yan, R&D director and principal scientist, said he expects that number to grow.

Sensetime sells a complete package of algorithms, hardware and software. It uses GPUs for training and inference of its deep learning models. Yan said the Inception team has helped Sensetime  more effectively apply GPUs to its work, and has even jumped in to help with bug fixes.

This year marked Yan’s sixth time at CVPR, and he estimates that he’s submitted 15 research papers to the show over the years.

AIPoly: The Humanitarian Side of AI

San Francisco startup AIPoly wants to help the blind “see” via their smart phone cameras.

AIPoly has classified 2 billion images so far. The knowledge gleaned from this deep learning effort powers a mobile application that lets vision-impaired users point their smart phones at objects, telling them what they’re “seeing.” Whether that means identifying a sandwich or matching the line number on the front of a bus with a route that will take the user home, the app seeks to serve as surrogate eyes.

Co-founder Alberto Rizzoli noted 90 percent of the world’s blind people live in relative poverty. So most can’t afford guide dogs, which can cost more than $60,000 to acquire and train.

“AI can be used to democratize this,” said Rizzoli.

Mapillary: Crowdsourcing More Accurate Local Maps

Maps are great. But maps filled with exactly the things you want to find are even better.

The Swedish venture-backed startup Mapillary collects image data from a huge range of sources to build a mapping dataset that delivers a level of detail and specificity that can’t be found elsewhere.

So, if an NGO wants to create a map geared toward accessibility for disabled people, or if a biking group wants a map for bikers, they can get that. And they’re likely to play a role in building those maps.

“The ones who contribute a lot, we solve some sort of problem for them,” said Jan Erik Solem, CEO and co-founder.

Mapillary, which was founded in 2013 and employs 32 people, gets a few hundred thousand images a day from around the globe and has recognized over 10 billion objects from those images thus far. It relies on GPU-powered Amazon P2 instances for processing and an in-office cluster of Titan XPs for training and experimentation to create its data sets in the form of APIs.

It then sells those APIs to customers, such as mapping companies, automakers and municipalities, either as stand alone datasets or as a subscription to all of its data. The data is free for individuals.