Data Science Bowl Launched to Improve Lung Cancer Screening

Date Posted: 

Friday, January 13, 2017

Data Science Bowl LogoThe third annual Data Science Bowl opened on January 12, 2017 with the goal of improving the detection accuracy of low-dose computed tomography (LDCT) lung cancer screening using data curated by NCI’s Division of Cancer Prevention and Division of Cancer Treatment and Diagnosis. This year’s competition encourages data scientists to develop machine learning algorithms to more accurately diagnose the presence of lung cancer at lower false positive rates than are currently encountered. Emphasis throughout the competition is being placed on solutions that meet the needs of real world applications.  The Data Science Bowl is being presented by Booz Allen Hamilton and Kaggle and is based on a project designed by NCI.  NCI staff collaborated with Booz Allen and Kaggle by building alliances, providing guidance on the scientific design of the competition, and facilitating data and image curation.

The Data Science Bowl naturally follows the National Lung Screening Trial (NLST), which was sponsored by NCI and launched in 2002. The NLST results demonstrated that LDCT screening reduced lung cancer mortality rates by 15-20% compared to standard chest X-ray; however, LDCT has historically resulted in high false positive rates (around 25%) that increase patient anxiety, promote unnecessary diagnostic follow-up testing and associated costs, and prevent its wider utility.  An NCI workshop in 2012 explored ways to approach reducing the false positive rates of LDCT lung cancer screening, and an algorithm challenge was recommended.  The goal of the Data Science Bowl is to develop diagnostic algorithms that can reduce false positive rates, which would ultimately lead to much more effective use of LDCT for lung cancer screening to the benefit of those eligible for screening. 

Data Science Bowl competitors will have access to training and test data sets derived from various sources. The competition will run from January 12, 2017 to April 12, 2017, and the winners will be announced shortly thereafter. The competition will award winners with $1 million in prizes, the funds for which will be provided by the Laura and John Arnold Foundation.

Read the Data Science Bowl launch press materials.