Take Two Algorithms and Call Me in the Morning

Data Science Bowl winners use deep learning to speed crucial step in drug discovery.
by Kimberly Powell

Three, it turns out, is better than one. At least that’s how it worked for a trio of former rivals who teamed up to claim the just-announced top prize in this year’s Data Science Bowl.

The fourth annual event focused on one of healthcare’s most pressing problems — the soaring cost and time needed to discover new drugs. A record-setting 18,000 participants battled over 90 days to deliver a deep learning algorithm to accelerate a crucial step in the drug-discovery pipeline: identifying the nucleus of each cell.

This year’s Data Science Bowl was “driven by a very real need to develop new treatments faster and more accurately,” said Anne Carpenter, director of the imaging platform at the Broad Institute of MIT and Harvard, the nonprofit partner for the contest.

Data Science Bowl participants used images like this one supplied by the Broad Institute at MIT and Harvard to train deep learning algorithms to spot nuclei and speed drug discovery.
Data Science Bowl participants used images like this one supplied by the Broad Institute of MIT and Harvard to train deep learning algorithms to spot nuclei and speed drug discovery.

International Team Takes the Prize

The winners beat out nearly 4,000 teams to win the Data Science Bowl, presented by the consulting firm Booz Allen Hamilton and the Kaggle platform for data science competitions, with additional sponsorship from NVIDIA and the medical diagnostics company PerkinElmer. Creators of the top algorithms will split $170,000 in cash and prizes, including powerful NVIDIA GPU hardware for deep learning.

In addition to the difficulty of spotting cell nuclei in dense medical images, the winning threesome — Selim Seferbekov, Alexander Buslaev and Victor Durnov — faced the challenge of collaborating across six time zones and three countries, Germany, Belarus and Russia. Using our GPUs for both training and inference, the team toiled for some 300 hours to create and implement their algorithm.

Their efforts paid off: Together they’ll collect $50,000 in cash, plus an estimated $70,000 in the latest NVIDIA GPUs built on our new Volta architecture. Volta uses NVIDIA CUDA Tensor Cores to deliver unprecedented levels of deep learning performance in hardware like our DGX Station, one of the most powerful tools for researchers.

Record-Setting Data Science Bowl

Collectively, competition participants worked an estimated 288,000 hours and submitted 68,000 algorithms, nearly three times as many submissions as in last year’s Data Science Bowl.

All three top teams used our GPUs to achieve their winning results. Other teams in the top three were:

  • Second Place ($25,000): Minxi Jiang, chief data scientist at a Beijing-based startup, who finished in the top one percent in last year’s Data Science Bowl.
  • Third Place ($12,000): Angel Lopez-Urrutia, a marine biologist in Spain who uses machine learning to automatically classify images of plankton, a challenge that was central to the inaugural Data Science Bowl.
Researchers used images like this to train their deep learning algorithms in the fourth Data Science Bowl, aimed at speeding drug discovery.
Researchers used images like this to train their deep learning algorithms to speed drug discovery in the Data Science Bowl. Image is courtesy of the Broad Institute of MIT and Harvard.

Drug Discovery Bottleneck

Finding new drugs is a complex and laborious task that can cost billions and take a decade or more per treatment. Biochemists try thousands of chemical compounds to figure out which, if any, are effective against a particular virus or bacteria or which cause a desired reaction in the human body. They do that by measuring how diseased and healthy cells respond to various treatments.

Because nearly all human cells contain a nucleus, the most direct route to identifying each cell is to spot the nucleus. Existing methods require time-consuming researcher oversight. Sometimes biologists have no choice but to personally examine thousands of images to complete their experiments.

“By identifying nuclei quickly and accurately, the algorithms developed in this competition can free up biologists to focus on other aspects of their research, shortening the approximately 10 years it takes for each new drug to come to market and, ultimately, improving quality of life,” said Ray Hensberger, a Booz Allen Hamilton principal.

Carpenter, of the Broad Institute, aims to use a winning algorithm to build deep learning software for drug discovery. The institute is now exploring the idea of creating a user-friendly, open source software that biomedical researchers can use in their day-to-day work.

Learn more about NVIDIA technology to advance deep learning in healthcare.

* Main image for this story shows human cell nuclei, which contains most of cells’ genetic material. RNA-processing proteins are in red and chromosomes are in blue. Image courtesy of the National Cancer Institute.

The fourth Data Science Bowl challenged participants to use deep learning to speed drug discovery.
The fourth Data Science Bowl challenged participants to use deep learning to speed drug discovery.