Scientists everywhere can now access Evo 2, a powerful new foundation model that understands the genetic code for all domains of life. Unveiled today as the largest publicly available AI model for genomic data, it was built on the NVIDIA DGX Cloud platform in a collaboration led by nonprofit biomedical research organization Arc Institute and Stanford University.
Evo 2 is available to global developers on the NVIDIA BioNeMo platform, including as an NVIDIA NIM microservice for easy, secure AI deployment.
Trained on an enormous dataset of nearly 9 trillion nucleotides — the building blocks of DNA and RNA — Evo 2 can be applied to biomolecular research applications including predicting the form and function of proteins based on their genetic sequence, identifying novel molecules for healthcare and industrial applications, and evaluating how gene mutations affect their function.
“Evo 2 represents a major milestone for generative genomics,” said Patrick Hsu, Arc Institute cofounder and core investigator, and an assistant professor of bioengineering at the University of California, Berkeley. “By advancing our understanding of these fundamental building blocks of life, we can pursue solutions in healthcare and environmental science that are unimaginable today.”
The NVIDIA NIM microservice for Evo 2 enables users to generate a variety of biological sequences, with settings to adjust model parameters. Developers interested in fine-tuning Evo 2 on their proprietary datasets can download the model through the open-source NVIDIA BioNeMo Framework, a collection of accelerated computing tools for biomolecular research.
“Designing new biology has traditionally been a laborious, unpredictable and artisanal process,” said Brian Hie, assistant professor of chemical engineering at Stanford University, the Dieter Schwarz Foundation Stanford Data Science Faculty Fellow and an Arc Institute innovation investigator. “With Evo 2, we make biological design of complex systems more accessible to researchers, enabling the creation of new and beneficial advances in a fraction of the time it would previously have taken.”
Enabling Complex Scientific Research
Established in 2021 with $650 million from its founding donors, Arc Institute empowers researchers to tackle long-term scientific challenges by providing scientists with multiyear funding — letting scientists focus on innovative research instead of grant writing.
Its core investigators receive state-of-the-art lab space and funding for eight-year, renewable terms that can be held concurrently with faculty appointments with one of the institute’s university partners, which include Stanford University, the University of California, Berkeley, and the University of California, San Francisco.
By combining this unique research environment with accelerated computing expertise and resources from NVIDIA, Arc Institute’s researchers can pursue more complex projects, analyze larger datasets and more quickly achieve results. Its scientists are focused on disease areas including cancer, immune dysfunction and neurodegeneration.
NVIDIA accelerated the Evo 2 project by giving scientists access to 2,000 NVIDIA H100 GPUs via NVIDIA DGX Cloud on AWS. DGX Cloud provides short-term access to large compute clusters, giving researchers the flexibility to innovate. The fully managed AI platform includes NVIDIA BioNeMo, which features optimized software in the form of NVIDIA NIM microservices and NVIDIA BioNeMo Blueprints.
NVIDIA researchers and engineers also collaborated closely on AI scaling and optimization.
Applications Across Biomolecular Sciences
Evo 2 can provide insights into DNA, RNA and proteins. Trained on a wide array of species across domains of life — including plants, animals and bacteria — the model can be applied to scientific fields such as healthcare, agricultural biotechnology and materials science.
Evo 2 uses a novel model architecture that can process lengthy sequences of genetic information, up to 1 million tokens. This widened view into the genome could unlock scientists’ understanding of the connection between distant parts of an organism’s genetic code and the mechanics of cell function, gene expression and disease.
“A single human gene contains thousands of nucleotides — so for an AI model to analyze how such complex biological systems work, it needs to process the largest possible portion of a genetic sequence at once,” said Hsu.
In healthcare and drug discovery, Evo 2 could help researchers understand which gene variants are tied to a specific disease — and design novel molecules that precisely target those areas to treat the disease. For example, researchers from Stanford and the Arc Institute found that in tests with BRCA1, a gene associated with breast cancer, Evo 2 could predict with 90% accuracy whether previously unrecognized mutations would affect gene function.
In agriculture, the model could help tackle global food shortages by providing insights into plant biology and helping scientists develop varieties of crops that are more climate-resilient or more nutrient-dense. And in other scientific fields, Evo 2 could be applied to design biofuels or engineer proteins that break down oil or plastic.
“Deploying a model like Evo 2 is like sending a powerful new telescope out to the farthest reaches of the universe,” said Dave Burke, Arc’s chief technology officer. “We know there’s immense opportunity for exploration, but we don’t yet know what we’re going to discover.”
Read more about Evo 2 on the NVIDIA Technical Blog and in Arc’s technical report.
See notice regarding software product information.