What Happens When You Put 500,000 People's DNA Online

enzozo/Shutterstock.com

Huge genetic databases are changing how scientists study disease.

Every big, ambitious project has to start somewhere, and for U.K. Biobank, it was at an office building south of Manchester, where the project convinced its very first volunteer to pee into a cup and donate a tube of blood in 2006.

U.K. Biobank would go on to recruit 500,000 volunteers for a massive study on the origins of disease. In addition to collecting blood and urine, the study recorded volunteers’ height, weight, blood pressure; tested their cognitive function, bone density, hand-grip strength; scanned their brains, livers, hearts; analyzed their DNA. In breadth and depth, the study is the first of its kind.

Handling all the samples was a logistical challenge. To process thousands of tubes of blood, for example, U.K. Biobank’s lab needed a new robotics system. (This ultimately came from a company that builds machines for packing sausages, not unlike tubes of blood in shape.) Each tube of blood was split into its component parts—red blood cells, white blood cells, plasma—and run through a battery of tests. White blood cells contain DNA, which the project had analyzed, too. When all was said done, U.K. Biobank had assembled one of the largest single genetic data sets ever. It all took a while.

This spring, 11 years after the first volunteer gave up a tube of blood, U.K. Biobank announced it would release its full genetic data set to registered scientists in July. This huge amount of genetic information, combined with the thousands of other characteristics tracked by U.K. Biobank, allows scientists to look for the genetic determinants of virtually any disease. Geneticists marked their calendars. “We heard stories that people who head groups had canceled holidays,” says Jonathan Marchini, a statistical geneticist at the University of Oxford. “Everyone has been waiting for this for so long.”

U.K. Biobank had done data releases before, including an earlier subset of the genetic data set with just over 100,000 people. In the past, research groups using the data wrote up their papers, submitted to journals, waited for peer review, and eventually their papers trickled out to the public. In the last year, however, an increasingly popular website called bioRxiv—pronounced “bio archive”—has changed the game. BioRxiv allows biologists to publish preprints, or preliminary drafts of their papers that have not yet been peer-reviewed.

Preprints based on the latest U.K. Biobank data started to come out almost immediately. Within two weeks, David Howard and Andrew McIntosh, psychiatry researchers at the University of Edinburgh, had posted not one but two preprints, one on genetic variants linked to depression and the other to neuroticism. Their team subsisted on pizza and worked “constantly.”

Others soon followed, and the flood of preprints has continued ever since. Never had genetics research moved so fast.

* * *

Ask scientists what’s so revolutionary about U.K. Biobank and they’ll say it’s big. But they’ll also say this: Nobody gets preferential access.

In the past, research groups that had gone through the trouble and expense of building DNA data sets have hoarded it for themselves, so that they could be the first to mine it for publishable insights. U.K. Biobank, however, is supported by the United Kingdom’s National Health Service. Its data is open to anyone in the world, as long as they are a legitimate researcher and pay a fee commensurate with the amount of data they want to access—a couple thousand dollars for the full genetic data.

When it came to releasing the 500,000-person data set, making sure everyone got the huge file (12 terabytes uncompressed) at the same time was no trivial matter. U.K. Biobank decided to allow registered researchers to start downloading the data weeks before its official July release. The catch: It was encrypted. The decryption keys went out to all research groups simultaneously on the official release date. Nobody got a head start of a few days, or even a few hours. Even Marchini, who helped U.K. Biobank process some of the data, was not allowed to analyze it for his own research purposes until it was available to all.

“The vision for providing the data to any bona fide researcher without preferential access was really a game changer,” says Manny Rivas, a biostatistician at Stanford University. Rivas, who is an assistant professor, noted it is a real boon for junior faculty, who haven’t had years to amass their own data. The availability of a data set as rich and deep as U.K. Biobank democratizes genetics research.

On top of this shared data set, several research groups have now built freely available tools to help other scientists make use of U.K. Biobank’s data. Marchini’s group made a web browser dedicated to parsing genetic and brain data from U.K. Biobank. Albert Tenesa, from the University of Edinburgh, created GeneATLAS, which accounts for family members in the database, the presence of whom usually screw up the math used to find links between genetic variants and disease. Rivas made the Global Biobank Engine, which is essentially a search engine for genes potentially associated with any disease. The Global Biobank Engine, in turn, is partly based on calculations done by Ben Neale, a geneticist at the Broad Institute, who looked at nearly 2,500 traits and disorders and how they corresponded with genetic variants in the U.K. Biobank.

(Unlike U.K. Biobank’s full data, these tools are accessible to anyone with an internet connection, but they show only aggregate data, so study participants should not be individually identifiable.)

In the past, looking at how a single trait corresponded with a set of genetic variants could be a paper in itself. It’s called a genome-wide association study, or GWAS. Neale’s group did 2,500 GWASs in a single day—and he didn’t even bother to write a paper. It’s a blog post on his website. Neale says it didn’t quite feel like a discrete journal article. It’s more a starting point for scientists interested in specific genes or traits. He’s since heard from both pharmaceutical companies and academic researchers using his GWAS data.

Tenesa, who uploaded a preprint describing GeneATLAS on bioRxiv in August, says he has also heard from a couple dozen researchers using the tool. Some have asked him to run calculations for specific traits. This is happening as he’s still working to publish a paper about GeneATLAS in an official journal. It’s the way things are now. “When I get my email from Nature Genetics these days, and they tell you what papers have just been published, I’ve often seen the papers nine months earlier on bioRxiv,” says Marchini.

But is there such a thing as too fast? Jeffrey Barrett, a geneticist at the Sanger Institute, has cautioned against hastily posting preprints based on a quick GWAS. “I understand why,” he says. “It’s the quickest way to get out the stamp that you’ve done this analysis first.” But it’s easy to miss possible artifacts or mistakes in a data set as big and complex as this one. And now that it’s easy to identify genetic variants linked to a disorder, says Barrett, simply enumerating the variants doesn’t add much value. U.K. Biobank has made genetics research easier, but it is also raising the bar.

To publish, researchers increasingly will have to tease out how a genetic variant found via GWAS may be causing a disease—perhaps by tracking how it’s expressed in different parts of the body or sequencing the gene. U.K. Biobank did not fully sequence the DNA in its blood samples, which would have been far too expensive; it used a technique called genotyping that spot-checked 820,967 sites in the genome. In March, though, it signed a deal with the pharmaceutical companies Regeneron and GSK to sequence the DNA of everyone in the study. This deal gives the pharma companies exclusive access to the sequences for nine months—a change in access policy. Eventually, the data will be available to the wider research community.

U.K. Biobank is still following its 500,000 volunteers, and will continue to do so for many years as they age. The technology available to scientists will advance over time, too. When planning for the study was going on, says U.K. Biobank’s principle investigator Rory Collins, studying 820,967 genetic markers for 500,000 people seemed unlikely: “No one envisaged that being possible so soon.” A decade on, any scientist in the world can do it.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.