Anyone with a hundred bucks and a saliva sample can get some intriguing genetic insights by mail-order. But using DNA for research or clinical purposes requires the whole genome — which means sequencing and processing all 3 billion base pairs which reside within our chromosomes.
The cost to do that has dropped dramatically over the years, from a billion dollars for the very first sequence in 2003 to less than $1,000 today. However, sequencing is only the first part of the process.
Now, the bottleneck for genomic insights is the computational analysis that follows sequencing. It’s the process of detecting key markers and outliers, called variants, in the genetic data.
Parabricks, a startup based in Ann Arbor, Mich., and NVIDIA Inception member, is shrinking the time this analysis takes from a couple days to under an hour. “It’s the first application for secondary analysis of genomic data on a GPU, and it fully matches the state-of-the-art analytical pipeline,” said Dave Gregorka, president of Parabricks.
This speedup enables researchers to efficiently analyze trends in genomic data from entire populations, benefiting the fields of personalized medicine, drug discovery and disease treatment. And it’s a game-changer for medical cases where a patient is in critical condition, and genetic analysis can help a doctor quickly diagnose and develop a treatment plan.
“By analyzing much, much faster, you can get to the right problem and the right solution much more quickly,” said Ankit Sethia, Parabricks cofounder and technical lead.
Need for Speed
The demand for whole genome sequencing and analysis is rising rapidly. Sethia says the amount of genetic data generated is doubling almost every year. At around 300 gigabytes per human genome sample, the computational demand adds up quick.
“When there are tens of thousands of patients, tens of thousands of samples you need to analyze, it can take years using CPU-based processing,” he said.
The Parabricks team developed software that runs on GPUs to rapidly analyze the genome. It identifies mutations and variants in the data, which helps medical professionals understand the patient at a genetic level and decide a path for treatment.
Running on a single NVIDIA DGX-1 server, Parabricks’ software can process more than 12,000 whole genomes per year — a feat that would require 40 CPU servers. Parabricks can also run its software on GPUs on the cloud using AWS, Azure or Google Cloud.
To a GPU, processing genomic data isn’t that different from processing an image.
The parallel processing power of GPUs works well for graphics because each pixel can be processed and calculated independently — it’s a bunch of tiny problems next to one another. It’s the same with genomic data, says Sethia. Data from DNA sequencing machines are made up of tiny individual pieces of genetic information that can be crunched separately and then strung back together.
Other speedy genomic analysis solutions suffer from lower accuracy than the current state of the art. Parabricks is up to date with the latest algorithms and, since it’s software, it can be easily updated or customized for users. For its genomic analysis, the startup uses NVIDIA CUDA, including the cuDNN deep learning libraries, as well as TensorRT inferencing software.
Parabricks rolled out the initial version of its GenomeBricks software suite to select customers in March and it’s in use around the globe, including in Singapore, Japan and Thailand. The company is also working on large population study projects, including national initiatives for precision medicine.