The 'Genome Hacker' Who Mapped a 13-Million-Person Family Tree

ImageFlow/Shutterstock.com

Huge crowdsourced genealogy databases are inspiring new genetics research.

Yaniv Erlich has been a white-hat hacker and a geneticist at Columbia University, and now he works for a genealogy company.

This unusual career trajectory has led, most recently, to a 13-million-person family tree unveiled today in Science.

The massive trove of data comes from public profiles on the crowdsourced genealogy website Geni.com, and it sheds light on human longevity and dispersal over time. (I wrote about a preprint of this paper last year.) But most of all, Erlich is excited about overlaying DNA information on top of family trees to study genes implicated in disease.

MyHeritage, the company behind Geni.com, also sells DNA ancestry tests. And since 2017, Erlich has been on leave from Columbia working as MyHeritage’s chief scientific officer to develop those DNA tests.

If that sounds like a lot of data going into the hands of one company, well, it is. Erlich has very much been in discussions about DNA research and privacy. In 2013, he showed it was possible, using only public information in places like consumer genealogy databases, to identify certain study participants who had contributed their DNA to research projects. For this feat, Nature dubbed him the “genome hacker.”

I talked to Erlich about how he thinks about privacy in the era of big-data genetics research—he’s actually published his genome online—and how genealogists have inspired his research. This interview has been condensed and edited for clarity.

Sarah Zhang: You’ve constructed a family tree connecting 13 million people. That’s quite an accomplishment, and so rather than let you bask in it, let me ask: Would it be possible to construct a family tree that connects every single living person in the world?

Yaniv Erlich: There is a theory you need to go 75 generations or so to connect everyone in the world. By everyone, I mean everyone. I’m talking about people in some tribes in the Amazon to someone in Iceland. So it’s possible, but there are no really good ways to trace so deeply. Maybe with genetics we can start to bridge gaps where the genealogy is not there.

Zhang: How did you get interested in genealogy?

Erlich: It’s a bit of a long story. Every kid in Israel in seventh grade needs to do a genealogy project. I did my genealogy and I was so excited about it. In fact, I won the school award for the best project.

Now at the end of 2008, my third cousin that is really into genealogy told me about this website Geni.com. He emailed me like, “Oh, do you want to put in some people in your family?” I was looking at the data and I was thinking—this was toward the end of my Ph.D.—somebody should download the data and do something cool about it. I didn’t have any application in mind.

Then in 2010, I started my own lab at the Whitehead Institute [at MIT]. I was sitting there thinking what I can do and reading a bit about how to mine social media. There was a book, Mining the Social Web. I sent a cold email to the CTO of Geni, asking if I could download the data. It’s a cold email, who knows right? He got back to me saying, “Yeah, you can download the data, no problem,” and gave me some pointers on how to do that.

A representation of 6,000 people in a family tree. Green dots are people and red represents marital links. (Columbia University)

Zhang: Your study is published now, but it seems like this is a beginning rather than an end. I’d imagine what you’re really interested in is overlaying genetic data on top of the family tree.

Erlich: Exactly. At MyHeritage, we started to offer DNA tests to users in November 2016. Since then we’ve collected 1.2 million DNA profiles of users.

Zhang: And why make the jump to MyHeritage? Are there things you can do at a company you couldn’t do in academia?

Erlich: I think this is a model for the future. There are certain things that you can only do in academia. There are certain things you can only do in companies. If you want to move in scientific endeavors, collaborating with companies is a very fruitful direction.

I could not do this study if I was just in a company because it’s years and years of process, and this is academic freedom that I could actually take this time. On the other hand, if there are no companies, nobody would collect this data. This amount of data, you cannot get it in academic studies. Companies have the ability to reach out to millions of genealogists, to work with them, to convince them to give the data, to give them the good feeling about it. You need a company that has websites that are perfect, that are responsive, that are fun to use. Not PubMed, which is a nice website but has a very geeky look.

Zhang: So what are you going to study now with the combination of DNA and genealogy data?

Erlich: Even better, we also have phenotypes now. Since October, we started to allow users to fill out surveys about themselves. So we have the genealogy and the surveys and the DNA. Our surveys are modeled after the U.K. Biobank surveys. We’re asking, did you have a heart attack? Are your parents suffering from Alzheimer’s?

About a year ago, Joe Pickrell and myself had a paper in Nature Genetics that was a genome-wide association study by proxy. Think about, say, we want to look at genes related to Alzheimer’s in our data set. If I go to our users and ask, do you have Alzheimer’s, they are healthy people; otherwise they wouldn’t be buying the test. So for certain diseases, it’s quite hard to get the information. What we show is you can ask users to ask about their first-degree relatives [parents, siblings, and children] and since you share half of their genome you lose half of the signal but you get so many people to answer the question that you get back to the power needed to implicate genetic variants.

Zhang: Let’s talk about privacy. Senator Chuck Schumer recently held a press conference calling for more scrutiny of DNA tests. You have a history of thinking about DNA and privacy, so how has that informed MyHeritage’s practices?

Erlich: It’s part of the challenge of this new era. At MyHeritage, we take it very seriously. We allow people to delete profiles. There are settings—you can have your profile private or public. People can delete their DNA data, and we’ll go to the lab and we’ll even wash away the tube. So we take these things very seriously.

But if you ask me, do you want to share with me your genealogy or your cellphone records or search-engine records, I will share my genealogy.

Zhang: In fact, you’ve put your own genome online.

Erlich: Because I feel like I don’t have a lot to risk in general. If you ask me do you want your search-engine data or data your ISP sees or your bank account versus your genome, your genome is actually quite—I don’t think it’s very interesting.

Zhang: In 2013, you actually published a paper finding that it’s possible to identify some DNA donors from publicly available information. I think this study still gets talked about a lot. Do you think it changed anything?

Erlich: I think it changed the way policy makers think about how we communicate risks to participants. I think previously the prevailing thought was we just promised them everything will be okay. Now we promise you 100 percent effort, but we are also learning.

I think the other interesting thing is in 2013, people didn’t understand why I did this study. I got many questions: “Why even do something like that?” And then now, we’ve matured into this data-intensive world, it became very clear this is the right research to do.

Zhang: That study was actually inspired by a mother-and-son pair who tracked down the son’s anonymous sperm donor using consumer genealogy databases, right?

Erlich: Yeah, the mother worked in Cold Spring Harbor [Laboratory in New York] or she used to work there 20 years before, and she contacted Cold Spring Harbor. I did my Ph.D. at Cold Spring Harbor, so I met her. I was like, “Wow, that’s crazy.” It was really mind-blowing.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.