by Will Ramey

In addition to serving as Senior Product Manager for GPU Computing, Will Ramey is a key member of the NVIDIA Foundation’s Compute the Cure team. In this role, he’s volunteered his time to help define this important philanthropic initiative and guide its first funded project to launch.    

Cancer, which kills one person every minute in the United States alone, is a disease of the DNA.

But since everyone’s DNA is different and there are many causes of cancer, no two cancers are exactly the same. A treatment plan that works for one person may not work for others with the “same” type of cancer, and figuring out the best treatment protocol is a significant challenge for doctors. To develop targeted and effective treatments, they need to analyze the patient’s DNA – a massive computational task considering each strand of DNA consists of roughly 3 billion elements known as “base pairs.”

There’s a brand new tool that can help researchers do just that. It’s called the Open Genomics Engine (OpenGE), and its mission is to streamline and accelerate the computational analysis of human genomes and help cancer researchers better understand how different forms of cancer work.

Developed by a team of researchers at Virginia Tech and the Virginia Bioinformatics Institute, OpenGE is an open source software platform that readily integrates into existing DNA analysis pipelines and workflows. Free and simple to install, OpenGE makes it easier for genomics researchers to evaluate large data sets and pinpoint the DNA mutations that cause cancer.

The OpenGE project was supported by a $100,000 grant from the NVIDIA Foundation’s Compute the Cure initiative, which focuses on supporting cancer researchers in their search for cures.

Digitizing human DNA isn’t easy. The first step is to feed DNA from several cells into a sequencer machine that can read up to a few hundred base pairs at a time (out of three billion pairs) and convert each fragment, or “short read,” into data that can be analyzed by a computer.

David Mittelman of Virginia Bioinformatics Institute
evaluates DNA data with a research colleague
in his lab

The next step is to figure out how to recombine all the short reads – about 300 GB of data covering three billion DNA base pairs – into a full DNA sequence. But, because each person’s DNA is different, this gigantic molecular puzzle requires tremendous computation. Researchers use complex “realignment” algorithms to shift short reads around to increase the probability that all the fragments have been reassembled accurately.

Finally, the digitized representation of DNA from the original cells is ready to be analyzed. Sophisticated “discovery” algorithms search for mutations or other patterns that can be used to identify a specific type of cancer, which ultimately can be used to design new drug treatments that target specific cancer cells.

OpenGE accelerates all these phases – mapping, realignment and discovery – to help close the gap between the mountains of data generated by gene sequencers and give researchers the information they need to better understand and battle various cancers.

Cancer researchers, medical biologists, computer scientists and others who want to use, and contribute to, OpenGE can learn more at