Faster AI, Lower Costs: We’re Showing How to Put AI to Work 100x Faster

by Shashank Prasanna

Speeding up inference — which helps trained neural networks make decisions and predictions faster — is a hot topic in AI research.

No surprise, since quick, solid judgment calls can pay off handsomely. Humans who are good at this often rise to the top of their professions.

Consider a financial trader who makes winning trades with good “gut” instincts, or a tennis player able to read an opponent’s every move. They’re not just knowledgeable, they act fast.

Now, imagine giving these experts superhuman speed.

At this week’s Computer Vision and Pattern Recognition conference, NVIDIA is demonstrating how an NVIDIA DGX Station running NVIDIA TensorRT and using only one of the four Tesla V100s we’ve equipped DGX Station with can perform a common inferencing task 100X faster than a system without GPUs.

In this video the CPU-only Intel Skylake-based system (on the left) can classify five flower images per second with a Resnet-152 trained classification network. That’s a speed that comfortably outpaces human capability.

By contrast, a single V100 GPU (on the right) can classify a dizzying 527 flower images per second, returning results with less than 7 milliseconds of latency — a superhuman feat.

While a 100X speed up in performance is impressive, that’s only half the equation. What are the costs associated with moving as fast as possible — what we here at NVIDIA call “speed of light”?

Remarkably, moving faster means fewer costs. One NVIDIA GPU-enabled system doing the same work as 100 CPU-only systems means 100 times fewer cloud servers to rent or buy.

The Hidden Cost: The Cost of Latency

The cost of latency is another important cost to factor in when considering CPUs versus GPUs for inference, according to Paul Kruszeski, the CEO and Founder Wrnch, a Mark Cuban-backed startup in NVIDIA’s Inception program.

Wrnch uses NVIDIA GPUs and our NVIDIA TensorRT inference optimizer and runtime as a foundation for its newly launched BodySLAM AI engine, which reads body language in real time for fun applications for interactive toys for kids.

“If I used CPUs only for my application, kids would have to wait a minute and a half for three seconds of fun,” said Kruszeski. “A minute and a half is infinity for a kid.”

To learn more about NVIDIA DGX Station with Tesla V100 GPU accelerators, go to

NVIDIA TensorRT is available to members of the NVIDIA Developer Program as a free download to speed up AI inference on NVIDIA GPUs in the data center, in automobiles and in robots, drones and other devices at the edge. To learn more, go to

Want to learn more about AI, machine learning and deep learning? Check out our cheat sheet to the top courses in AI