The agreements of the 50 most popular websites in America are composed of 145,641 words. This is why.
We work with data and research digital rights issues, and we were curious whether most popular websites respect your privacy as much as they claim to. So we gathered up and analyzed the 145,641 words that make up the privacy policies of the 50 most popular American websites. (Collectively, they amount to a text that’s about as long as The Grapes of Wrath .) What we found was that these policies tell you very little about the data these websites have on you. And that’s the point.
Today’s privacy policies don’t tell consumers the whole story for two main reasons. First, websites have adopted a kind of precautionary legalese to inoculate themselves against lawsuits and fines. The vaguer and more elastic their language, the more risk reduced. Second, over the past ten years, a new industry of “data brokerage” has arisen to help sites learn more about the people like you and me on the other side of the screen. These firms cross-reference and synthesize data to create richly detailed profiles that can include purchasing habits, political affiliations, sexual orientation, religious beliefs, and medical history. Gathering and analyzing that data is big business, and it creates a strong financial incentive for the firms that collect it to make it as difficult as possible for you to opt out of their net.
* * *
It’s easy to get lost in the opaque language of privacy policies. LinkedIn says it will only share your information “as reasonably necessary in order to provide our features and functionality to you.” Facebook pronounces, “We may use any of the non-personally identifiable attributes we have collected (including information you may have decided not to show to other users, such as your birth year or other sensitive personal information or preferences) to select the appropriate audience for those advertisements.” Tumblr says, “You may access Third Party Services through the Services, for example by clicking on externally-pointing links.” Privacy policies are clearly written for lawyers, not consumers.
It’s not that privacy is so complicated that this is the only way companies can express themselves when it comes to user data. Casey Oppenheim, co-founder of Disconnect , an app that lets you block a website’s third-party data gatherers, says there’s another reason web companies are so cagey about data: “They know that if they tell people every single way they’re collecting information and using it, then most users will share less information, which would mean less money for them.”
We emailed Paypal about a particular line in their policy: “We do not sell or rent your personal information to third parties for their marketing purposes without your explicit consent.” We had a simple, but important question: “What do you mean by explicit consent?” After five email exchanges, we were no closer to having a specific answer to our question. We did, however, get many helpful tips about how to set up a Paypal account.
Statements about not selling personal information concerned us because their entire meaning is dependent on the definition of “consent.” The privacy lawyers and consumer rights advocates we talked to told us that in America, a consumer can “consent” to a website’s data sharing policy by failing to proactively “opt-out” of the default settings. You can “consent” simply by using a website. The term “explicit consent” requires an affirmative step on the part of the customer, which generally means ticking a box that says you have read and accepted a website’s terms and conditions. But as John Oliver said on Last Week Tonight , if Apple put the full text of Mein Kampf in its iTunes Terms of Service, we’d all still click “Agree.”
This ambiguity surrounding “consent” becomes disconcerting when one considers what one is “consenting” to, namely widespread data collection by third party companies.
* * *
Here's the thing: It’s not just Facebook and Twitter that are keeping tabs on your activity. Many of the most popular sites enlist “third parties” to gather your data for them. Privacy policies may lack clear descriptions of data collection practices, but almost all mention third parties. When you visit a website like huffingtonpost.com, not only does The Huffington Post collect your data, but 33 other companies do as well, according to the Disconnect app. By law, websites only need to tell you that they interact with these other companies, not what these companies do with your information. Sites don’t even need to tell you which companies they hire. According to their privacy policies, forty eight of the top fifty websites in America use third parties. Only nine say which ones.
Oppenheim says that the most popular websites partner with ad and analytics companies to grab data and personalize ads so they don’t have to themselves. “Those companies are tracking you on their site, and they’re tracking you via your IP or your device information, which they log on their own servers, and make sure they know how you behave on the most popular sites across the web,” he said. “And the only thing ESPN has to say in its privacy statement is that they allow other people to look at their aggregate information.”
It’s not just the quality of these documents; it’s the quantity as well. A 2008 Carnegie Mellon study found that it would take the average Internet user between 181 and 304 hours to read all the privacy policies for the websites she visits each year. Note that you would have to repeat this exercise every year, because most companies update their policies annually.
This is all strategic: If texts are sufficiently long and boring, then customers won’t bother to read or question them. So you might never find out that social widgets—including the ones on the page you’re currently reading—let Facebook, Twitter, and the like watch what you do on the sites that use them.
Or you might never know that Internet companies can use the data they have on you as collateral —if that company is unable to repay a debt, your data is transferred to the lender who can then do whatever she wants with it.
And you might never know that ostensibly impersonal data can, collectively, be reconstructed to identify you personally. Many of the 50 websites we looked at assured users that any information shared with third parties is “anonymized.” But according to Hans, third parties can decipher anonymized data to figure out who it belongs to. “Because anonymization techniques are computing-based, they can ultimately be reversed through more sophisticated computing,” said Hans. “Having a 100 percent success rate of anonymization is not possible.”
Personal information like location data can be de-anonymized with as little as two data points: where you work and where you live, each of which is publicly available. Anonymized Netflix user data can be decoded by comparing it with public IMDB.com data. Specific data on your device can be paired with specific data on your browser version to identify you and your entire browsing history, according to Oppenheim.
* * *
There is a camp of people who don’t care about data collection. If you are a responsible Internet user, they say, you have nothing to fear.
But that camp is shrinking. According to Pew , half of Americans are “worried about the amount of personal information about them that is online.” In 2009, only a third of the population felt that way. As more people learn how much of their personal information is in the hands of strangers with algorithms, they become concerned. “The ‘I don’t have anything to hide’ argument is on its last legs,” Oppenheim says.
And it’s not just about a stranger knowing intimate details about your life. The prevalence of invisible third parties drastically reduces the speed at which a webpage loads. According to a study by Disconnect , these invisible sites slow down the average page by roughly 27 percent on desktop.
Moreover, that your personal data is held by anyone at all means that your identity is at greater risk of being stolen. 34 million Americans have already experienced identity theft, according to the Bureau of Justice Statistics .
And the future could hold new possibilities of discrimination by data. In the past, banks would “redline” neighborhoods with minority populations and refuse to give loans to these residents. Now some, including the White House , fear that financial institutions or employers might “digitally redline” people based on the profiles they assemble through data collection.
* * *
Moving forward, it’s all about transparency. New measures will have to be honest and place the consumer first. Last fall, data broker Acxiom launched a feature that lets you look yourself up on their servers and, after personal verification, adjust the data they have on you (but not remove it). It was a self-serving gesture of openness that in fact gets the company more accurate and more valuable data, for free. But such measures accomplish nothing in the way of genuine respect for user privacy.
Progress will come in the form of plain language privacy policies that make consumers want to read. Until then, users can use apps like Disconnect which offer quick ways to inform yourself about websites’ data-gathering techniques. Other plugins let you block third party cookies, and you can send a “Do Not Track” signal by adjusting your browser’s preferences. The most popular websites field thousands of items of feedback every day, but some of them do listen. Write them asking for simplified privacy information. And, where possible, choose the service that respects your privacy over the one that doesn’t.
NEXT STORY: Infographic: Big Data, Where Does it Come From?