How AI Brings a Critical Eye to Video, and Keeps Companies Out of Hot Water

NVIDIA AI inference platform helps JD achieve 40x increase in video detection efficiency.
by Dave Salvator

Streaming video traffic is growing exponentially, and it shows no signs of slowing down. Internet traffic from streaming video in 2021 will quadruple today’s traffic, according to estimates by Cisco Visual Networking Index.

That’s great for people using sites like Facebook, SnapChat or YouTube, the latter of which now has over 1 billion users — or nearly one-third of all internet users. But it’s a daunting task for enterprises that need to screen out inappropriate or illegal content uploaded to their platforms.

JD is the second largest business-to-consumer e-commerce company in China. Using NVIDIA Tesla GPUs and our DeepStream software development kit (SDK) based deep learning platform for smart video analytics, JD achieved a 40x increase in its video detection efficiency.

Keeping Up When Your Business Goes POP

JD, which has over 265 million active users, is on the Fortune Global 500 with a trading volume that reached nearly RMB1 trillion ($154 billion) in 2016. The rapid growth of its business is fueled in part by POP, its open e-commerce platform that allows independent stores to upload photos and videos of products.

Stores drop terabytes of videos, pictures and text onto POP each day. One hundred million items are uploaded daily in pictures alone. It’s JD’s responsibility to ensure that the uploaded pictures and videos don’t contain inappropriate content.

Previously, JD needed one CPU to process each video that was uploaded onto POP. If it wanted to process 1,000 videos simultaneously, then it needed to deploy 1,000 CPUs in the cloud — a huge investment and extremely complex process.

To meet its video processing needs, JD adopted NVIDIA’s AI inference platform powered by Tesla P40 GPUs, the DeepStream SDK and TensorRT, a programmable inference accelerator. This platform allowed the company to identify and filter 1,000 full-HD videos in real time, which translated into a 20x increase in throughput when executing inference-based video content filtering.

The NVIDIA Tesla P40, developed for high-throughput deep learning inference, provided a 40x performance increase for JD compared with CPUs. A single server with four Tesla P40 GPUs can replace servers with over 50 CPUs.

Simultaneous Decode and Analysis

The DeepStream SDK simplifies the development of scalable, intelligent video analytics applications for smart cities and hyperscale data centers. It brings together TensorRT for inference, Video Codec SDK for transcoding and all the required preprocessing and data curation into a single optimized API.

DeepStream allowed JD to simultaneously decode and analyze video streams in the detection process, accelerate its inference speeds with TensorRT while reducing its energy consumption. It also allowed JD to understand the content of large batches of videos, which improves video analysis efficiency.

Now, every server based on NVIDIA’s AI platform can process 20 videos at the same time, enabling JD to achieve real-time inference for 1,000 HD video streams simultaneously. And, by reducing the numbers of servers needed by 83 percent, it’s saved the company a considerable amount of equipment and manpower.

“NVIDIA GPUs and its DeepStream SDK for smart video analytics are huge steps forward for video stream processing,” said Chen Yu, senior director of the AI and Big Data Recognition/Identification R&D Department at JD. “We’re grateful to NVIDIA for continuously improving its GPU performance. Plus, we can now inference video streams on any framework.”

JD’s improved detection has also shortened the time for stores to upload product data on POP and gain approval for sales, which has created a better service experience.