10,000 Data Scientists Take on Lung Cancer in Data Science Bowl

by Jamie Beckett

Lung cancer is the deadliest of all cancers, and not just because it’s the most common variant of the disease.

Nearly 80 percent of patients will die within five years of diagnosis, according to the American Lung Association, largely because most people don’t realize anything’s wrong until it’s too late.

In one of the most ambitious competitions in AI, two researchers at Beijing’s Tsinghua University used deep learning and GPUs to come up with an algorithm that could lead to a life-saving way to spot lung cancer early enough to treat it. Their challenge was to use machine learning to improve the accuracy of CT scans, which are more effective than X-rays at detecting lung cancer.

The researchers, Liao Fangzhou and Zhe Li, beat nearly 2,000 other teams, with a total 10,000 members, to win $500,000 in the third annual Data Science Bowl, sponsored by consulting firm Booz Allen Hamilton and the Kaggle data science community, with additional sponsorship from NVIDIA and others.

Winners of the Data Science Bowl lung cancer challenge developed algorithms that could help diagnose lung cancer when there's time to treat it.

Winners to Reveal Strategies

Top teams will present their winning solutions at the GPU Technology Conference, May 8-11, in Silicon Valley. Winners will split a prize purse of $1 million, the largest-ever for the competition, funded by the Laura and John Arnold Foundation.

The second-place finisher collects $200,000, the No. 3 team gets $100,000, and the rest of the prize money is divided among other teams in the top 10.

For Liao, a Ph.D. student in computational neuroscience  at Tsinghau University, the competition was personal.  Lung cancer is common in his hometown, where air pollution is severe and a smoke-spewing factory stood next to his middle school. Shortly after the competition began, he learned that a friend had been diagnosed with the disease.

The team used NVIDIA TITAN X GPUs to train its convolutional neural network.

Other top finishers were: :

  • Second Place – Julian de Wit and Daniel Hammack, both software and machine learning engineers based in the Netherlands. In 2016, de Wit placed third in the Data Science Bowl. He describes his role in the lung cancer screening challenge in a post on his personal blog.
  • Third Place: Team Aidence, named for the Dutch company where two of its members work, used our Tesla K80 GPU accelerators to develop its algorithm. A third team member works for the nonprofit Open AI in San Francisco.

“The Data Science Bowl shows that the power of collective ingenuity, data science and advanced analytics can be harnessed to tackle society’s toughest challenges like eradicating cancer,” said Josh Sullivan, a senior vice president at Booz Allen.

Needed: An Accurate CT Scan for Lung Cancer

Low-dose CT scans are more likely to detect lung cancer than routine X-rays because, instead of taking one picture like a regular X-ray, they show detailed cross-sections of the body. In a recent National Cancer Institute trial, people who received low-dose CT scans had a 15 to 20 percent lower risk of dying from lung cancer than those who received X-rays.

Unfortunately, as many as a third of CT scans detects lung cancer when it’s not there, according to a study published in the Annals of Internal Medicine. That causes needless anxiety for patients and their families, and can lead to unnecessary tests and other procedures.

“Reducing the false positive rate of low-dose CT scans is a critical step in improving the accuracy of CT screening of lung cancer and having a positive impact on public health,” said Keyvan Farahani, program director, National Cancer Institute. Farahani, one of the speakers at the upcoming GTC session, provided scientific guidance for the competition’s design and datasets.

Participants in this year’s Data Science Bowl logged an estimated 150,000 hours and submitted nearly 18,000 algorithms.

To learn more about how AI computing is transforming healthcare and other industries, join us at GTC, May 8-11, in Silicon Valley.