How Amazon and NVIDIA Help Sellers Create Better Product Listings With AI

Amazon doubles inference speeds for new AI capabilities using NVIDIA TensorRT-LLM and GPUs to help sellers optimize product listings faster.
by Fred Oh

It’s hard to imagine an industry more competitive — or fast-paced — than online retail.

Sellers need to create attractive and informative product listings that must be engaging, capture attention and generate trust.

Amazon uses optimized containers on Amazon Elastic Compute Cloud (Amazon EC2) with NVIDIA Tensor Core GPUs to power a generative AI tool that finds this balance at the speed of modern retail.

Amazon’s new generative AI capabilities help sellers seamlessly create compelling titles, bullet points, descriptions, and product attributes.

To get started, Amazon identifies listings where content could be improved and leverages generative AI to generate high-quality content automatically. Sellers review the generated content and can provide feedback if they want to or accept the content changes to the Amazon catalog.

Previously, creating detailed product listings required significant time and effort for sellers, but this simplified process gives them more time to focus on other tasks.

The NVIDIA TensorRT-LLM software is available today on GitHub and can be accessed through NVIDIA AI Enterprise, which offers enterprise-grade security, support, and reliability for production AI.

TensorRT-LLM open-source software makes AI inference faster and smarter. It works with large language models, such as Amazon’s models for the above capabilities, which are trained on vast amounts of text.

On NVIDIA H100 Tensor Core GPUs, TensorRT-LLM enables up to an 8x speedup on foundation LLMs such as Llama 1 and 2, Falcon, Mistral, MPT, ChatGLM, Starcoder and more.

It also supports multi-GPU and multi-node inference, in-flight batching, paged attention, and Hopper Transformer Engine with FP8 precision; all of which improves latencies and efficiency for the seller experience.

By using TensorRT-LLM and NVIDIA GPUs, Amazon improved its generative AI tool’s inference efficiency in terms of cost or GPUs needed by 2x, and reduced inference latency by 3x compared with an earlier implementation without TensorRT-LLM.

The efficiency gains make it more environmentally friendly, and the 3x latency improvement makes Amazon Catalog’s generative capabilities more responsive.

The generative AI capabilities can save sellers time and provide richer information with less effort. For example, it can enrich a listing for a wireless mouse with an ergonomic design, long battery life, adjustable cursor settings, and compatibility with various devices. It can also generate product attributes such as color, size, weight, and material. These details can help customers make informed decisions and reduce returns.

With generative AI, Amazon’s sellers can quickly and easily create more engaging listings, while being more energy efficient, making it possible to reach more customers and grow their business faster.

Developers can start with TensorRT-LLM today, with enterprise support available through NVIDIA AI Enterprise.

Explore generative AI sessions and experiences at NVIDIA GTC, the global conference on AI and accelerated computing, running March 18-21 in San Jose, Calif., and online.