recommended reading

Congress Will Let Internet Providers Sell Your Data—So Rebels Devised a Way to Fool Corporations

Peeradach Rattanakoses/Shutterstock.com

Last week, Congress voted to overturn rules that would have prevented internet service providers from selling customers’ data without permission. Though the rules had not yet gone into effect, the vote drew considerable attention to the question of how people can better protect their online privacy and data.

One increasingly popular option is to make use of tools designed to help obscure your online activity—the better to throw off surveillance from corporations and governments alike. Dan Schultz, a programmer, reacted to the vote by creating a tool called Internet Noise to help people seed their online activity with “noise,” or random web searches and sites that obscure their true browsing habits.

Noisify, a Chrome extension, performs a similar function by generating random searches on your Facebook page, so Facebook knows a little less about what you’re actually looking at or interested in. AdNauseam, another browser extension, will click on lots of ads for you, so any insights about your behavior of buying habits gleaned from these clicks will be largely worthless. Another browser extension, TrackMeNot, generates random web searches, so “actual web searches, lost in a cloud of false leads, are essentially hidden in plain view.”

The logic behind such tools is that if you are always under surveillance online—and if companies can freely buy and sell all your data—then you may as well give up on trying to keep your online habits secret. Instead, the goal of these tools is to bombard any would-be watchers with so much garbage data, they are unable to draw any accurate conclusions about who you are and what you’re doing. All of your genuine online activity will still be available to buyers and sellers, but it will be bundled up with so much automatically generated nonsense data, no one will be able to sort out what was really you and what was just noise.

The problem with these tools and strategies—which are undoubtedly well-intentioned—is that it’s actually pretty hard to generate convincingly realistic-looking noise. After all, most of our online searching doesn’t happen on a rigorously timed schedule of one search every 10 seconds.

These kinds of regularized, repeated patterns make it pretty easy for a search engine to figure out they’re not coming from an actual person, and flag and block them. Furthermore, most random combinations of common nouns or phrases strewn amidst a person’s genuine browsing history are likely to stand out as artificial and deliberate attempts at obfuscation.

That doesn’t mean there’s zero value to these tactics. If your goal is simply to make it a little harder for advertisers to figure out what sorts of things you might be interested in buying, then inserting even some fairly rudimentary random noise into your browsing habits may do the trick. But a slightly more sophisticated analysis algorithm—to say nothing of a person actually looking at your data—would likely be able to strip away the noise fairly easily.

Still, the challenge of generating realistic-looking noise is not an insurmountable one. People may yet improve on the existing tools and find ways of making the noisy data more persuasive and indistinguishable from your actual online activity. One strategy might be to have large groups of people all agree to merge each other’s online browsing histories—in other words, to use actual people’s online activity as your “noise” so it displayed the characteristics of genuine user behavior.

Should the strategy of making internet noise catch on, it is likely to face some fairly formidable adversaries. Search engines, though they are not impacted by the rules overturned by Congress, also have a vested interest in being able to collect accurate data on their users. They could pretty easily block many of these automated obfuscation efforts simply by using their existing tools to detect bots.

After all, search engines often make money by selling ads based on user data, too. And they tend to create and fine-tune their search algorithms based on what people search for and what they click on. So if everyone’s computers are performing lots of randomly generated searches and clicking on the results, that has the potential to skew search engine data and degrade the quality of search results for all their users.

It’s always encouraging to see people fighting for their online privacy, and to see smart people designing tools to address the mistakes of policy-makers. But these tools have limitations—which should be a reminder even the cleverest workarounds are no substitute for sensible policy.

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats

JOIN THE DISCUSSION

Close [ x ] More from Nextgov