Scientists Have a Sharing Problem
Competition and disorganization within their disciplines prevent many researchers from making their data publicly available, which is stunting scientific progress.
When it comes to sharing information, there seems to be quite a difference of opinion—across areas both trivial and serious—as to how much is enough. Some people broadcast their lives on Facebook; others poke fun at the oversharers in their feeds. The Edward Snowdens of the world fight for greater government transparency, even as some argue for less of it. Last year, some experts called for heightened secrecy in the technology sector, arguing that public expectation stifles creativity.
A study published in October in BioScience revealed that sharing is a tricky topic in science as well. Michigan State University ecology professor Patricia Soranno and colleagues found that while many environmental-science researchers believe data sharing is beneficial—for the replication of analyses, the ability to confirm data integrity, and the overall advancement of science—few actually take the steps to make their own materials publicly available after their research is published.
Save for a few sentences in the “Methods and Analysis” sections, the data used to produce published manuscripts is often kept private—sometimes purposefully. When a scientist does make a request to obtain another researcher’s materials, their inquiries might be unanswered or denied, forcing them to delay or put on hold their own projects. And ecology isn’t the only branch of science grappling with too much secrecy: The same thing is happening in genetics, biology, chemistry, and engineering, too.
This may seem counterintuitive, given that science has traditionally been a field that prizes collaboration. After all, laboratories are rarely one-man operations, and some projects transcend geographic borders, as with global collaborations like the Human Genome Project and its sibling, the Human Brain Project.
The question, then, is why so many scientists are so stingy with their information. Because scientific progress relies so heavily on the process of validating and building upon prior material, it might seem counterproductive to withhold information from other researchers. But even science, a discipline grounded in reason, isn’t immune to the influence of ego and emotion.
The culture of innovation breeds fierce competition, and those on the brink of making a groundbreaking discovery want to be the first to publish their results and receive credit for their ideas. There’s more at stake than just the acknowledgement of being first and a metaphorical blue ribbon; being first to publish can mean invitations to national meetings, academic promotions, industry appointments, and research awards, including the Nobel Prize.
Physician David Blumenthal, now president of the Commonwealth Fund and a longtime researcher in the field of health-information technology and data-sharing practices, was working at Massachusetts General Hospital in the late 1970s where he witnessed the consequences of competition.
“I remember a peer who was sequencing a gene for a particular protein that had a lot of potential clinical application. He had spent many years working on it in a very prestigious lab, but days before he was about to finish someone beat him to publication,” Blumenthal said. “He lost five years of work at a critical time in his career and ended up leaving research for clinical practice. It wasn’t that he didn’t do good work, but if you’re not first, you don’t get the credit.”
It’s not to say everyone who gets beaten to the finish line drops out or that all researchers are strictly rewards-driven, but if sharing data paves the way for an expert to build upon or dispute other scientists’ results in a revolutionary way, it’s easy to see why some might choose to withhold.
One could pin the problem entirely on the competitive culture, but it’s only one of many reasons why scientists choose not to share their data, even after their studies are published. Among them, of course, is the lack of funding. Transferring data can be expensive: A 2002 study published in the Journal of American Medical Association by Blumenthal and colleagues found that among geneticists, 45 percent withheld data because it cost too much to send the materials to the scientists who had requested them.
“This was something that we did not anticipate, but when data is a physical thing such as a reagent, an antibody, a chemical, a mouse, or a reengineered organism, the cost and administrative difficulties are an important obstacle,” he said.
In the same study, 80 percent of respondents also reported that the effort required to produce their data prevented them from sending it to other researchers who asked for it. The underlying cause is most likely something more than sheer laziness: According to a 2012 study in the Journal of Computational Science Education that conducted in-depth interviews of researchers in 11 fields, including biology, ecology, and physics, some disciplines don’t even have formal digital repositories for data storage, and others don’t yet have standardized methods of interpreting and annotating it.
The consequences here are twofold: First, the lack of a centralized digital storage space means that data might only be kept on a personal computer or exist solely in paper form, so digging it up and sending it to a requesting party can be time-consuming, especially for scientists who have hundreds of studies under their belts. And second, the absence of uniform methods to record or describe data creates its own challenges.
While scientists do publish the results of experiments, other researchers may also need descriptions of data, called "metadata," including things like the temperature of samples, the make and calibration of equipment, time of day samples were taken, or rate of error of the samples. Though they might appear obvious to that particular researcher, these details would provide important additional information or spur additional research questions. Unfortunately, many fields don't have set rules in place as to how much metadata is required. Having to explain these specifics after publication requires extra time on the part of the original party, which could explain why requests for data are ignored.
Another study published in Academic Medicine in 2006 found more reasons for scientists’ reluctance to share, including protecting industry relationships or being less familiar with the investigators requesting the data. These findings show that data withholding isn’t always motivated by vengeance or the desire to get ahead; in some cases, the lack of resources makes it difficult to share it.
Irrespective of the motive, data withholding has produced documented consequences. In his 2002 study, Blumenthal and colleagues found that 28 percent of those surveyed were unable to replicate research as a direct result of another scientist’s refusal to share, 24 percent had a publication significantly delayed, and 21 percent had to abandon a research interest altogether. Despite individual costs, Blumenthal acknowledged that science (genetics, in the case of this study) was still thriving, but wondered “whether [progress] is as rapid as it could be if data sharing were maximized.” Could any of the world’s most pressing scientific or medical problems be solved, or at least greatly ameliorated, if data were fully accessible?
There are a few steps that could be taken to increase transparency, though the issue would have to be tackled not only by scientists, who are the purveyors of information, but also by journals, publishers, universities, funding agencies, and industry professionals. Scientists would need a centralized place to store their data, meaning more digital repositories would need to be created. The field of astronomy has reaped the benefits of designing communal data banks early on.
“Scientists started sharing data from the Hubble telescope and the Sloan Digital Sky Survey because they were collecting it at such high volumes that they needed a place to put it,” Soranno said. “Millions of users have been exposed to the data, which has resulted in thousands of studies being published, even by scientists not affiliated with those two projects.”
The scientific community would also need to establish protocols on how data should be stored, so that it becomes less time-consuming for other researchers to locate and interpret results. Some scientists have even advocated for data to be peer-reviewed and accepted by other collaborators—in the same type of procedure now used for journal articles—to ensure that the data meets scientific standards, is reliable, and was collected using logical methods.
Many experts support a mandate to require data-sharing after publication, a practice that the American Psychological Association (APA) and Public Library of Science (PLoS) currently require for all studies published in their journals. Though it’s not always strictly enforced, implementing such a regulation across the board would at least put greater pressure on researchers to release the information underlying their work.
There is also a push to examine the efficiency of the publication process. Both Soranno and Blumenthal agreed that data should not be shared prior to publication, but they also agree that it would be beneficial to look into developing a process that allows scientists to copyright material even before publication.
“In the commercial sector, if you file a patent, as soon as you file, you are protected and you can start to share,” Blumenthal explained. “In a similar fashion, you could create opportunities at the level of presentation to register the content so that for the record, your work is recognized publicly once you display it.”
Because many manuscripts are presented at conferences and symposiums before they are published, this method would patent a researcher’s work and protect their first-to-publish privilege, while simultaneously allowing other scientists to build upon their ideas and hypotheses much sooner.
The good news is that many disciplines are already embracing scientific openness, in part due to the influence of social media.
“I see a lot of budding scientists who are blogging, tweeting, and creating websites for their data and presentations,” Soranno said. “They see it as a way to get ideas out and feedback on their projects. I personally have found it very inspiring.”
The practice is trickling into the pharmaceutical industry as well: In a progressive move last month, industry giant GlaxoSmithKline released a sharing system with raw data from 200 clinical trials in order to enhance transparency for new drugs.
Inevitably, the biggest challenge will be changing the culture of secrecy in disciplines that are less prone to collaboration. Blumenthal believes it can happen, but only when certain practices, such as sharing mandates and repositories, are put into place.
“You are not going to limit secrecy just by calling on scientists to be altruists. Some will be, of course, but you need to implement processes and methods to make it easier and less costly to share data,” he said. “You want to make sure their personal interest, that of receiving recognition, and the ethical requirement are aligned.”
In her paper, Soranno calls environmental scientists’ ethics “out of date”: “[They] are increasingly concerned about the ethical importance of promoting inclusivity, including groups that are traditionally underrepresented in science, such as women and racial minorities,” she told me. “But if inclusivity is a central ethical value, then data sharing should also be a central ethical value, because data sharing is essential to promoting inclusivity.”
It could take some time to change existing sharing practices, but once customs of transparency are in place and continue to be nurtured, they can stay put for generations. Blumenthal told me about the successful tradition of transparency in the field of yeast genetics.
“The discipline has a very familial feel to it. It goes back to a group of seminal figures that believed in sharing, and they trained their colleagues and subordinates to embrace openness. So they then passed down this ethic of sharing that’s been thriving ever since,” he said. “I do think it’s worth teaching an ethic of sharing, because a young scientist’s early approach to sharing will likely become their approach for life.”
Another paper, published last year in Bioscience, calls for responsibility to be shared between data suppliers and data users, arguing that it’s not enough for people to share their materials; the individuals who use the data also need to provide attribution or co-authorship. Proper recognition where due would help data sharers feel more comfortable and make them more willing to provide information.
The solution isn’t to eliminate withholding entirely. It’s clear that not all materials should be shared: Data that has not yet been published, violates a patient’s privacy or breaks an industry agreement should remain confidential. But there is a lot of information in the scientific community that has the potential to improve, cure, and innovate, and it should get into the hands of scientists who need it and can use it for the greater good.
“If scientists’ role in society is to generate knowledge for both knowledge’s sake and the good of society, then scientists should be sharing both their ideas and their data for everyone to access,” Soranno said. “These practices will ensure that everyone has the opportunity to contribute to moving knowledge forward.”