NVIDIA Advances Accelerated Computing, Generative AI at AWS re:Invent

Learn more about the latest NVIDIA technologies at the AWS-hosted conference, running Nov. 27-Dec. 1 in Las Vegas.
by Rohil Bhargava

The rapid development of generative AI and large language models (LLMs) brings growing computational demands, along with a need for software expertise to develop, customize and deploy AI-powered applications.

Building on their 13 years of collaboration, NVIDIA is working with Amazon Web Services (AWS) to help organizations meet these demands by providing cloud developers with full-stack accelerated computing solutions for challenging workloads in generative AI, visual computing, high performance computing and more.

Learn more at AWS re:Invent 2023, taking place Nov. 27-Dec. 1, in Las Vegas, and join NVIDIA on the showfloor at the AWS Generative AI Pavilion.

NVIDIA experts at the following sessions will share how organizations can accelerate their AI capabilities using NVIDIA accelerated computing on AWS:

  • Simplifying the Adoption of Generative AI for Enterprises: Peter Dykas and Jiahong Liu, senior solutions architects at NVIDIA, will discuss NVIDIA’s full-stack platform for generative AI, which includes accelerated infrastructure, AI frameworks, and enterprise services and tools that simplify building custom LLMs and bring generative AI solutions to market faster. Join on Tuesday, Nov. 28, at 2 p.m. PT.
  • How to Accelerate Apache Spark Pipelines on Amazon EMR with RAPIDS: Sameer Raheja, senior director of software engineering at NVIDIA, will share insights at this lightning theater talk on Wednesday, Nov. 29, at 4 p.m. PT. Learn how the NVIDIA RAPIDS Accelerator for Apache Spark on the Amazon EMR platform boosts Spark pipelines for data processing by up to 30% compared to industry-standard benchmarks.

NVIDIA is sponsoring the AWS Generative AI Pavilion (booth 372), which will showcase several startups using NVIDIA AI on AWS, including:

  • Perplexity, which offers pplx-api, an efficient application programming interface for open-source LLMs optimized for fast inference. With the help of NVIDIA TensorRT-LLM served on Amazon EC2 P4d instances, Perplexity achieved up to 2x lower latency compared with other solutions.
  • Baseten, which provides all the infrastructure needed to deploy and serve models performantly, scalably and cost-efficiently. Using NVIDIA TensorRT-LLM’s tensor parallelism served on AWS, Baseten boosted inference performance for a customer’s LLM deployment by 2x through their open-source Truss framework.
  • LILT, an interactive and adaptive contextual AI platform, automates content generation and translation for the enterprise. Using NVIDIA GPUs and NVIDA NeMo deployed on AWS for highly sensitive data privacy applications, LILT can meet strict latency and quality requirements and deliver results to customers in real time.

They’re all members of NVIDIA Inception, a program that nurtures startups revolutionizing industries with technological advancements, and the NVIDIA Developer Program, which empowers developers with technical resources and AI expertise.

AWS and NVIDIA will also host a joint reception at the International Smoke House in the MGM Grand on Wednesday, Nov. 29, from 6-8 p.m. PT.

From using NVIDIA GPU-accelerated Amazon EC2 P5 instances, powered by NVIDIA H100 Tensor Core GPUs, to NVIDIA AI Enterprise, an open platform for production AI available on AWS marketplace, enterprises can harness NVIDIA’s infrastructure on AWS to support the most challenging AI use cases. Learn more by connecting with NVIDIA and its partners at AWS re:Invent.

Explore generative AI sessions and experiences at NVIDIA GTC, the global conference on AI and accelerated computing, running March 18-21 in San Jose, Calif., and online.