Image recognition systems are growing increasingly sophisticated – but they don’t come close to matching the efficiency of the ones we carry around with us inside our skulls.
As part of an effort to close that gap, our Jetson TX1 embedded computing module swept both tracks of the recent Low Power Image Recognition Challenge, held in Austin, Texas, at the IEEE Rebooting Computing event.
We’ve invested substantial resources in the power efficiency of Jetson’s GPU compute architecture. In gaming and professional design, this means fluid framerates on a frugal power budget. But in the realm of computer vision, performance per watt enables rapid control loops and near real-time responsiveness from an autonomous machine, such as a drone or a robot.
The Low Power Image Recognition Challenge began when NVIDIA’s David Kirk, Yung-Hsiang Lu at Purdue University and Alex Berg at University of North Carolina Chapel Hill, decided that image recognition on a power budget was a worthy challenge. The first two years presented modest challenges, with smaller groups of researchers, Yung-Hsiang notes. He plans to expand the competition over the coming years, including larger prizes.
Power efficiency is essential to making certain sophisticated computer vision applications — like smart drones, head-mounted displays and object recognition capabilities for cell phones . People can identify objects (and do so much more) with a brain that consumes about 20 watts. By contrast, the best classifiers in the world run in supercomputers, data centers and workstations, which draw thousands of watts.
Maximizing Accuracy, Minimizing Power, with Jetson
On the day of the competition, contestants brought their hardware and logged into a server with a reference Python script. The server then delivered up to 20,000 images for each system to recognize in 10 minutes. The contest’s organizers connect each team’s hardware to a power meter.
The goal is to classify with the greatest accuracy, but using the least amount of power. The server calculates scores by dividing the accuracy of the classifier by the average power consumed by the device.
This year’s winning team used a Jetson TX1 running the latest release of cuDNN 4.0. The team implemented Bing+ Fast-RCNN for Track 1 and Faster-RCNN within Caffe for Track 3.
“TX1 has all that we want from a mobile device: throughput, low power and flexibility to choose precision mode,” says Wang Ying, the principal investigator and advisor for the winning team. “There are a lot of CNN-based recognition frameworks emerging: fast-rcnn, yolo, ssd, etc. They provide enough options for us to find the most suitable one for both the challenge and the TX1 hardware.”
Winning Strategy: Keeping Jetson’s CPU and GPU Busy
Wang, a professor at the Chinese Academy of Sciences, says the key to success is balancing the workload between the CPU and the GPU, keeping both fully occupied at all times. Starting with NVIDIA Tesla K40 GPU accelerators, the team did a “design space exploration” to determine the best models to use on both desktop GPUs and the Jetson TX1 embedded system.
Through many iterations, they discovered that model pruning and singular value decomposition tended to reduce the size of their CNN models. The team also attempted to use cuFFT and the cuSparse to optimize their pipeline, but didn’t find this sort of approach to help their speed.
Smart. But researchers will have to exercise their minds a little more if we’re going to create image recognition systems that match the brain’s own efficiency, making this a contest to watch for years to come.