NIH is building a virtual space called “the Commons,” where researchers can share data, software and other virtual tools.
Biological researchers should be able to share datasets the same way they share published scientific articles — at least, that’s what the National Institutes of Health is trying to ensure with a new project.
NIH is building a virtual space called “the Commons,” where researchers can one day do just that: share data, software and any other virtual tools or research processes in a way that’s “Findable, Accessible, Interoperable and Reusable,” or “FAIR.”
"Eventually, I can imagine a researcher saying, 'I read this paper, I want to find where the data is.' . . . Instead of finding the paper, they can actually go out and find the data," NIH Senior Adviser for Data Science Technologies Vivien Bonazzi told Nextgov.
In preparation for the Commons, the Department of Health and Human Services agency wants researchers to add to the technical requirements for cloud services providers. (It plans to finalize requirements by January or February of this year.)
"What we do know is we're not going to build an NIH cloud," Bonazzi said. "We know that there are commercial vendors out there, [though] we're not forcing researchers to use them."
NIH's request for information comes at a time when medical data initiatives are garnering public attention. Last year, President Barack Obama announced the Precision Medicine Initiative, proposing $70 million in additional funding for research into personalized cancer treatment by the National Cancer Institute.
NIH is running several concurrent pilots for the Commons, including the Human Microbiome Project, making about 20 terabytes of data available in an Amazon Web Services cloud, as well as virtual tools and application programming interfaces to use on it, and the NCI's Genomic Data Commons, which aims to store about 2.5 petabytes of cancer genomics data in AWS and Google clouds.
"Essentially, this is science at scale," Bonazzi said. "Right now, the way researchers do it is they tend to run it on local machines...They download [data,] but it's getting bigger and bigger."
But getting researchers in the habit of making their research and data sharable and machine readable might require broader cultural change and new incentives to do so — for instance, encouraging researchers to point to citations of their datasets or software, just as they point to citations of their papers, when applying for grants.
Today, researchers are sometimes reluctant to do so, Bonazzi said.
"It's got to do with the publication cycle in part," she said. "I don't share my data with you until it's published. . . It's a big shift, [but] the plus side is, there's a lot of funding [for] it."
Bonazzi added that NIH is running these pilots to gather metrics about the logistical requirements for the Commons project will be.
"On the NIH end, we want to know how much is it going to cost, where is the data, how is it getting used. . . [we're] figuring out how to do that," she said.