Like Magic: NVIDIA Merlin Gains Adoption for Training and Inference

The internet is becoming more personalized, running on NVIDIA AI platforms for GPUs — boosting Snap, Postmates and Tencent.
by Even Oldridge

Recommenders personalize the internet. They suggest videos, foods, sneakers and advertisements that seem magically clairvoyant in knowing your tastes and interests.

It’s an AI that makes online experiences more enjoyable and efficient, quickly taking you to the things you want to see. While delivering content you like, it also targets tempting ads for jeans, or recommends comfort dishes that fit those midnight cravings.

But not all recommender systems can handle the data requirements to make smarter suggestions. That leads to slower training and less intuitive internet user experiences.

NVIDIA Merlin is turbocharging recommenders, boosting training and inference. Leaders in media, entertainment and on-demand delivery use the open source recommender framework for running accelerated deep learning on GPUs. Improving recommendations increases clicks, purchases — and satisfaction.

Merlin-Accelerated Recommenders 

NVIDIA Merlin enables businesses of all types to build recommenders accelerated by NVIDIA GPUs.

Its collection of libraries includes tools for building deep learning-based systems that provide better predictions than traditional methods and increase clicks. Each stage of the pipeline is optimized to support hundreds of terabytes of data, all accessible through easy-to-use APIs.

Merlin is in testing with hundreds of companies worldwide. Social media and video services are evaluating it for suggestions on next views and ads. And major on-demand apps and retailers are looking at it for suggestions on new items to purchase.

Videos with Snap

With Merlin, Snap is improving the customer experience with better load times by ranking content and ads 60% faster while also reducing their infrastructure costs. Using GPUs and Merlin provides Snap with additional compute capacity to explore more complex and accurate ranking models. These improvements allow Snap to deliver even more engaging experiences at a lower cost.

Tencent: Ads that Click

China’s leading online video media platform uses Merlin HugeCTR to help connect over 500 million monthly active users with ads that are relevant and engaging. With such a huge dataset, training speed matters and determines the performance of the recommender model. Tencent deployed its real-time training with Merlin and achieved more than a 7x speedup over the original TensorFlow solution on the same GPU platform. Tencent dives into this further at its GTC presentation.

Postmates Food Picks

Merlin was designed to streamline and support recommender workflows. Postmates uses recommenders to help people decide what’s for dinner. Postmates utilizes Merlin NVTabular to optimize training time, reducing it from 1 hour on CPUs to just 5 minutes on GPUs.

Using NVTabular for feature engineering, the company reduced training costs by 95 percent and is exploring more advanced deep learning models. Postmates delves more into this in its GTC presentation.

Merlin Streamlines Recommender Workflows at Scale

As Merlin is interoperable, it provides flexibility to accelerate recommender workflow pipelines.

The open beta release of the Merlin recommendation engine delivers leaps in data loading and training of deep learning systems.

NVTabular reduces data preparation time by GPU-accelerating feature transformations and preprocessing. NVTabular, which makes loading massive data lakes into training pipelines easier, gets multi-GPU support and improved interoperability with TensorFlow and PyTorch.

Merlin’s Magic for Training

Merlin HugeCTR is the main training component. It’s designed for training deep learning recommender systems and comes with its own optimized data loader, vastly outperforming generic deep learning frameworks. HugeCTR provides a parquet data reader to digest the NVTabular preprocessed data. HugeCTR is a deep neural network training framework specifically designed for recommender workflows capable of distributed training across multiple GPUs and nodes for maximum performance.

NVIDIA Triton Inference Server accelerates production inference on GPUs for feature transforms and neural network execution.

Learn more about the technology advances behind Merlin since its initial launch, including its support for NVTabular, HugeCTR and NVIDIA Triton Inference Server.