NVIDIA BioNeMo Enables Generative AI for Drug Discovery on AWS

Pharma and techbio companies can access the NVIDIA Clara healthcare suite, including BioNeMo, now via Amazon SageMaker and AWS ParallelCluster — and coming to NVIDIA DGX Cloud on AWS.
by Kimberly Powell
Image of a generated biomolecule

Researchers and developers at leading pharmaceutical and techbio companies can now easily deploy NVIDIA Clara software and services for accelerated healthcare through Amazon Web Services.

Announced today at AWS re:Invent, the initiative gives healthcare and life sciences developers using AWS cloud resources the flexibility to integrate NVIDIA-accelerated offerings such as NVIDIA BioNeMo — a generative AI platform for drug discovery — coming to NVIDIA DGX Cloud on AWS, and currently available via the AWS ParallelCluster cluster management tool for high performance computing and the Amazon SageMaker machine learning service.

Thousands of healthcare and life sciences companies globally use AWS. They will now be able to access BioNeMo to build or customize digital biology foundation models with proprietary data, scaling up model training and deployment using NVIDIA GPU-accelerated cloud servers on AWS.

Techbio innovators including Alchemab Therapeutics, Basecamp Research, Character Biosciences, Evozyne, Etcembly and LabGenius are among the AWS users already using BioNeMo for generative AI-accelerated drug discovery and development. This collaboration gives them more ways to rapidly scale up cloud computing resources for developing generative AI models trained on biomolecular data.

This announcement extends NVIDIA’s existing healthcare-focused offerings available on AWS — NVIDIA MONAI for medical imaging workflows and NVIDIA Parabricks for accelerated genomics.

New to AWS: NVIDIA BioNeMo Advances Generative AI for Drug Discovery

BioNeMo is a domain-specific framework for digital biology generative AI, including pretrained large language models (LLMs), data loaders and optimized training recipes that can help advance computer-aided drug discovery by speeding target identification, protein structure prediction and drug candidate screening.

Drug discovery teams can use their proprietary data to build or optimize models with BioNeMo and run them on cloud-based high performance computing clusters.

One of these models, ESM-2 — a powerful LLM that supports protein structure prediction —  achieves almost linear scaling on 256 NVIDIA H100 Tensor Core GPUs. Researchers can scale to 512 H100 GPUs to complete training in a few days instead of a month, the training time published in the original paper.

Developers can train ESM-2 at scale using checkpoints of 650 million or 3 billion parameters. Additional AI models supported in the BioNeMo training framework include small-molecule generative model MegaMolBART and protein sequence generation model ProtT5.

BioNeMo’s pretrained models and optimized training recipes — which are available using self-managed services like AWS ParallelCluster and Amazon ECS as well as integrated, managed services through NVIDIA DGX Cloud and Amazon SageMaker — can help R&D teams build foundation models that can explore more drug candidates, optimize wet lab experimentation and find promising clinical candidates faster.

Also Available on AWS: NVIDIA Clara for Medical Imaging and Genomics

Project MONAI, cofounded and enterprise-supported by NVIDIA to support medical imaging workflows, has been downloaded more than 1.8 million times and is available for deployment on AWS. Developers can harness their proprietary healthcare datasets already stored on AWS cloud resources to rapidly annotate and build AI models for medical imaging.

These models, trained on NVIDIA GPU-powered Amazon EC2 instances, can be used for interactive annotation and fine-tuning for segmentation, classification, registration and detection tasks in medical imaging. Developers can also harness MRI image synthesis models available in MONAI to augment training datasets.

To accelerate genomics pipelines, Parabricks enables variant calling on a whole human genome in around 15 minutes, compared to a day on a CPU-only system. On AWS, developers can quickly scale up to process large amounts of genomic data across multiple GPU nodes.

More than a dozen Parabricks workflows are available on AWS HealthOmics as Ready2Run workflows, which enable customers to easily run pre-built pipelines.

Get started with NVIDIA Clara on AWS to accelerate AI workflows for drug discovery, genomics and medical imaging.

Subscribe to NVIDIA healthcare news.