Freeing Our Fingers: Handing Over VR’s Toughest Challenge to GPUs

by Tony Kontzer

In the real world, our hands are our guides. We feel with them, we manipulate with them, we explore with them. We use them to eat, dress and primp ourselves, make a living, and connect with others.

And yet, in the virtual world, we’re lucky if we can use them at all.

A team of researchers at Purdue University hopes to change that with DeepHand, a deep learning-powered system for interpreting hand movements in virtual environments.

By combining depth-sensing cameras and a convolutional neural network trained on GPUs to interpret 2.5 million hand poses and configurations, the team has taken us a large step closer to being able to use our dexterity while interacting with 3D virtual objects.

Natural Interface

DeepHand fulfills the long-time vision of its lead researcher, Karthik Ramani, the Donald W. Feddersen Professor of Mechanical Engineering, at Purdue.

“I’ve always wanted to design and develop our hands as a key part of a user interface element, because we do so much in the real world with our hands so naturally,” Ramani said. “The use of hand gestures offers smart and intuitive communication with 3D objects.”

Ramani said that the emergence of more affordable depth-sensing cameras has broadened the possibilities for hand-movement recognition, and has raised expectations for more natural use of hands in human-computer interfaces.

GPUs are helping the cause by speeding up the training of convolutional neural networks such as the one created for DeepHand. Ramani and his two graduate student researchers, Ayan Sinha and Chiho Choi, used NVIDIA GPUs to train their network, and Ramani said they were able to complete the process 2-3 times faster than if they’d used CPUs.

Working Out the Kinks

Despite the team’s clear progress, numerous challenges remain. Parts of the fingers and hands often block the view of the camera, making interpretation of hand motions sometimes impossible. The hand’s numerous joints and sheer volume of potential motions are almost limitless. What’s more, parts of the hand look so similar to one another that the system can sometimes struggle to identify what part it’s looking at.

“Figuring out the exact hand location and angles of all the joints through vision is not as easy as fitting a line through a bunch of points,” Ramani said. “It is a much harder problem.”

Fortunately for Ramani, the project has support in the form of National Science Foundation funding routed through his affiliated startup company, ZeroUI, which is focused on the development of the hands as a user interface. (The company has gained some attention for its Ziro modular construction kit for building hand-controlled robotic toys.)

Big Plans

Ramani’s team plans to eventually commercialize DeepHand through ZeroUI. But he says they have more work to do in reducing the “noise” that interferes with hand motion interpretations before it starts developing augmented and virtual reality applications.

“The hand model has to be made robust and utilitarian for real-world AR and VR use,” he said. He and his team plan to keep pushing toward just that.

The team presented its research paper at the 2016 IEEE Conference on Computer Vision and Pattern Recognition held in Las Vegas earlier this summer.