How NVIDIA AI Labs Are Driving the Future of Computer Vision

by Anushree Saxena

How is sports strategy like self-driving cars and brain tumor diagnosis?

They’re all the work of world-leading universities that are breaking new ground in artificial intelligence at the NVIDIA AI Labs. And they’ll all be on deck in the next few days at IEEE’s Computer Vision and Pattern Recognition (CVPR) conference, the premier annual computer vision event.

Stanford University has a better way to plan sports strategy. University of Oxford researchers are teaming up with NEC Labs America and others to solve one of the thorniest problems for self-driving cars. And the National Taiwan University (NTU) is working on a better way to diagnose brain tumors.

Our NVAIL program helps us keep AI pioneers like these stay ahead of the curve with support for students, assistance from our researchers and engineers, and access to the industry’s most advanced GPU computing power, the DGX-1 AI supercomputer. NVAIL includes 20 universities from around the world.

Read on for more information about the research.

Treating Brain Tumors

Winston Hsu, a professor at NTU, thinks there should be a better way to diagnose brain cancer.

When doctors diagnose the disease, they’re not just looking for malignant tissue. They need to know where liquids around the tumor could cause brain swelling. They need to find out if the cancer has killed any tissue. And they need to know the size, shape and location of everything they find. All this information helps them determine the best way to treat their patients, Hsu said.

The National Taiwan University team developed a more efficient way to “segment” MRI images to distinguish the tumor and tissue types around it. Image courtesy of the National Taiwan University

For this complex problem, a simple MRI won’t cut it. To accurately detect each tissue type, Hsu said physicians must process MRIs four different ways and examine all of this data.

Hsu and his team used NVIDIA DGX-1 to train a deep neural network to analyze all four image types at once. The researchers also used the DGX-1 to deploy their deep learning model, a process known as inference.

Hsu is not the first to apply deep learning to examining brain tissue images, but he is believed to be the first to combine all the image types into one algorithm. Hsu and the other researchers will present a paper on their research on July 23 at CVPR.

Increasing Safety in Autonomous Vehicles

The complexity and diversity of driving environments makes autonomous driving difficult. At a busy intersection, a car must interpret stationary elements like traffic lights and lanes, and respond to moving objects like pedestrians, cyclists and other cars.

A research team led by NEC Labs America and the University of Oxford aims to make driving safer by training vehicles to predict what will happen in these complicated situations.

Using deep learning, the researchers developed a framework to predict how stationary and moving elements will interact. Unlike many existing solutions, this work goes beyond estimating how an object — like a car or pedestrian — will move from one point to another.

Instead, the framework assumes a moving object could go anywhere, and makes a series of hypotheses about what is most likely to happen. It does this by evaluating both the context of the scene — perhaps a busy traffic intersection or a pedestrian crosswalk — as well as interactions between neighboring objects.

For example, the car could anticipate several different trajectories for a cyclist, or hypothesize that a child playing alongside the road might throw their ball into the street or run out after it.

After establishing several hypotheses, the framework makes a strategic prediction about what will happen. These predictions have proven highly accurate when compared to real world behaviors.

Learning curve: The framework’s predictions (shown in red) get closer to the ground truth (shown in blue) through multiple iterations of deep learning. Image courtesy of DESIRE research team.

According to Namhoon Lee, a Ph.D. student from Oxford, “This framework offers a safer way to predict future interactions, because it predicts various future outcomes rather than limiting the possibilities for what might happen, and because the most likely predictions scored by the framework are more accurate than other systems.”

On the road, where almost anything can happen, this combination of flexibility and accuracy could make all the difference.

Lee will present a paper on this research on July 23 at CVPR.

Stanford AI technology identifies where players are, what they’re doing and interprets team behavior. Image courtesy of Stanford University and École Polytechnique Fédérale de Lausanne.

How AI Interprets What Sports Teams Do

Sports teams are always looking for a competitive edge, so it’s no wonder some are turning to AI to improve player performance and craft strategy.

Winning takes more than boosting individual players. Whether it’s on the field or on the court, teamwork is what makes the game. Stanford University Professor Silvio Savarese and his team are tackling this problem by using deep learning to analyze game tapes.

“When more than one person is in a scene, they’re not acting alone — they interact,” said Savarese.

The team’s research focused on volleyball, but it could apply to other sports, as well as to robotics and self-driving cars, according to Alexandre Alahi, a research scientist at Stanford. By understanding group dynamics, robots might be able to behave more like humans. The technology could also be used in self-driving cars to understand what pedestrians are doing — say, crossing the street while distracted by a mobile phone, Alahi said.

Existing efforts to understand social interactions detect which scenes include a specific person, track that person over time and determine what the person is doing, said Timur Bagautdinov, a doctoral student at École Polytechnique Fédérale de Lausanne. That has to be repeated for every player. Finally, researchers stitch it all together to try to make sense of what they have.

The team developed a framework that does everything other approaches do, but in just a single pass through a neural network. For more technical details, see the paper on social scene understanding they’ll discuss at CVPR on July 23.

Feature image: MRI brain image segmentation. Credit: National Taiwan University.