recommended reading

Sometimes the Best Big Data Questions Raise the Biggest Privacy Concerns


One useful definition for the unstructured data that underlies most existing and theoretical big data projects is that it was often collected for some purpose other than what the researchers are using it for.

That definition was provided by Chris Barrett, executive director of the Virginia Bioinformatics Institute during a series of presentations before the President’s Council of Advisors on Science and Technology on Thursday focused on the value of data mining for public policy.

Data that was initially collected to measure educational achievement, for instance, could be used to analyze how educational achievement relates to obesity or incarceration rates in a particular community.   

This definition points to the potential of big data analysis as more and more information is gathered online and elsewhere, but it also points to some challenges as outlined by Duncan Watts, a principal researcher at Microsoft’s research division.  

First off, a large portion of the data that might be valuable to social scientists, policymakers, urban planners and others is held by private companies that release only portions of it to researchers. Facebook, Amazon, Google, email providers and ratings companies all know certain things about you and about society, in other words, but there’s no way to aggregate that data to draw global insights.

“Many of the questions that are of interest to social science really require us being able to join these different modes of data and to see who are your friends what are they thinking and what does that mean about what you end up doing,” Watts said. “You cannot answer these questions in any but the most limited way with the data that’s currently assembled.”

Second, even if social scientists were able to draw on that aggregated data, it would raise significant privacy concerns among the public.

“This is a very sensitive point because, to some extent, this is what the NSA has been reputedly doing, joining together different sorts of data,” Watts said. “And you can understand how sensitive people are about that. Precisely the reason why this is scientifically interesting is also the reason why it’s so sensitive from a privacy perspective.”

Finally, because much of the data that’s useful to social scientists was gathered for other purposes, there’s often some bias in the data itself, Watts said.

“When you go to Facebook, you’re not seeing some kind of unfiltered representation of what your friends are interested in,” he said. “What you’re seeing is what Facebook’s news ranking algorithm thinks that you'll find interesting. So when you click on something and the social scientist sees you do that and makes some inference about what you’re sharing and why, it’s hopelessly confounded.”

(Image via Laborant/

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • It’s Time for the Federal Government to Embrace Wireless and Mobility

    The United States has turned a corner on the adoption of mobile phones, tablets and other smart devices, outpacing traditional desktop and laptop sales by a wide margin. This issue brief discusses the state of wireless and mobility in federal government and outlines why now is the time to embrace these technologies in government.

  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

  • A New Security Architecture for Federal Networks

    Federal government networks are under constant attack, and the number of those attacks is increasing. This issue brief discusses today's threats and a new model for the future.

  • Going Agile:Revolutionizing Federal Digital Services Delivery

    Here’s one indication that times have changed: Harriet Tubman is going to be the next face of the twenty dollar bill. Another sign of change? The way in which the federal government arrived at that decision.

  • Software-Defined Networking

    So many demands are being placed on federal information technology networks, which must handle vast amounts of data, accommodate voice and video, and cope with a multitude of highly connected devices while keeping government information secure from cyber threats. This issue brief discusses the state of SDN in the federal government and the path forward.

  • The New IP: Moving Government Agencies Toward the Network of The Future

    Federal IT managers are looking to modernize legacy network infrastructures that are taxed by growing demands from mobile devices, video, vast amounts of data, and more. This issue brief discusses the federal government network landscape, as well as market, financial force drivers for network modernization.


When you download a report, your information may be shared with the underwriters of that document.