Biden announces public cancer database

Vice President Joe Biden, the public face of the administration's Cancer Moonshot, announced the release of the National Cancer Institute's Genomic Data Commons, a public-facing data platform that allows researchers to access, analyze and upload genomic health data to advance cancer research.

Vice President Joseph Biden. White House photo.

p>Vice President Joe Biden said better, more accessible data "is critical to speeding up development of lifesaving treatments for patients."

Vice President Joe Biden announced June 6 the public release of the National Cancer Institute's Genomic Data Commons, a publicly accessible database that allows researchers to access, analyze and upload genomic health data to advance cancer research.

The GDC, built and managed by the University of Chicago Center for Data Intensive Science, centralizes and makes public cancer data from 12,000 patient records. The two petabytes of data come from earlier NCI programs: the Cancer Genome Atlas and the Therapeutically Applicable Research to Generate Effective Therapies.

The platform's web-based aggregation of records allows anyone to easily search, access and filter the data, and NCI hopes to jumpstart the crowdsourcing of medical research by encouraging researchers to upload their own findings for analysis.

"Increasing the pool of researchers who can access data and decreasing the time it takes for them to review and find new patterns in that data is critical to speeding up development of lifesaving treatments for patients," said Biden, who has previously advocated for opening medical research data.

In the past, downloading and navigating such a massive trove of data would have been untenable, director of NCI's Center for Cancer Genomics Dr. Louis Staudt told reporters.

"The data has been available, but was very, very cumbersome to get. To download all of the data from the cancer genome atlas would take 3 weeks of continuous download [and] require $1 million of software, and a team of people to ensure privacy… Only very well-funded and well-positioned researchers were able to access the data," Staudt said.

However, Staudt said that by moving to a cloud-based architecture in which large-scale computations take place, public access and global participation are unlocked. And NCI anticipates even greater cloud interoperability in the future.

"This is currently a private cloud operating at the University of Chicago that can interoperate with the Amazon cloud, and will later interoperate with" a variety of other public cloud platforms that will include Google and Microsoft, Staudt continued. "This is just the beginning of that."

NCI wants researchers to "take advantage of the software we've built from these genomes… and share their data with the world," said Dr. Warren Kibbe, director of NCI's Center for Biomedical Informatics and Information Technology.

Kibbe told reporters that the GDC is "the basement level of a large effort" from NCI to assemble a comprehensive catalog of cancer patients' medical records "in a computable environment that researchers around the world can have access to."

The data will remain in raw form, meaning researchers will be able to analyze the information as new computational technologies and methods arise.

Kibbe said the goal is to collect data from 100,000 patients to create a substantive sample size. "GDC is one step from turning that into a reality," he continued. "It's very unlike anything we've had before… We can put their data together with all the other publicly available data that's been produced from cancer patients."

The GDC builds on the Obama administration's previous actions to individualize health care, namely the Cancer Moonshot and Precision Medicine Initiative. According to the National Institutes of Health, the GDC is a part of PMI's $70 million fund for NCI to research cancer genomics.

NEXT STORY: Getting government to go digital