Microsoft Uses GPUs to Build Record-Breaking Image Recognition System

by Jay White

Microsoft researchers became the latest to achieve record results on ImageNet, a prestigious image recognition benchmark, thanks to the use of GPUs.

Microsoft’s system, compared with last year’s result, cuts the top-5 error rate by half, correctly classifying images within 1,000 pre-defined categories more than 96 percent of the time. The system is a 152-layer neural network, which is nearly five times deeper than the state of the art.

The breakthrough comes amidst an artificial intelligence renaissance sparked by the use of GPUs to create powerful neural networks. Until recently, asking a computer to tackle simple image recognition tasks — such as recognizing a bird in a picture — caused the most advanced systems to stumble.

Going Deep

No longer. New neural network algorithms, access to vast troves of data and powerful GPUs have converged. The result is a revolution called “deep learning.”  Researchers are now building systems that recognize photos and even video more accurately than humans can.

With GPUs, deep learning training processes run much faster on fewer servers. This helps users to build and optimize new training models fast, and, ultimately, build new, highly accurate deep learning applications.

Record Results

Researchers from corporations, government and academia are now racing to create systems with ever better performance on a number of widely followed benchmarks.

The latest breakthrough comes from Microsoft. Researchers at its Beijing-based research center created a record-breaking 152-layer neural network, achieving top scores on two key ImageNet benchmarks: localization and detection.

On another key benchmark, the Microsoft Common Objects in Context challenge, known as MS COCO, the Microsoft team grabbed the top spot for image detection and segmentation. (Started by Microsoft, MS COCO is now overseen by an independent group of academics.)

Microsoft Research is also experimenting with improving ImageNet deep learning results by using their recently open-sourced CNTK deep learning framework. CNTK with Azure GPU Lab integration has accelerated Microsoft’s internal speech recognition task by 10X over previous systems.

Better Than Human

Image recognition is one of the highest-profile applications for GPU-powered deep learning. For years researchers had been chasing the Holy Grail of topping human ability to accurately identify images.

They got there earlier this year when Microsoft Research announced image-recognition systems that surpassed human accuracy.

Image recognition, however, is just one of a number of machine learning applications. GPUs are also essential to speech recognition. Microsoft used it as the basis for real-time translation with Skype Translator.