recommended reading

Sometimes the Best Big Data Questions Raise the Biggest Privacy Concerns


One useful definition for the unstructured data that underlies most existing and theoretical big data projects is that it was often collected for some purpose other than what the researchers are using it for.

That definition was provided by Chris Barrett, executive director of the Virginia Bioinformatics Institute during a series of presentations before the President’s Council of Advisors on Science and Technology on Thursday focused on the value of data mining for public policy.

Data that was initially collected to measure educational achievement, for instance, could be used to analyze how educational achievement relates to obesity or incarceration rates in a particular community.   

This definition points to the potential of big data analysis as more and more information is gathered online and elsewhere, but it also points to some challenges as outlined by Duncan Watts, a principal researcher at Microsoft’s research division.  

First off, a large portion of the data that might be valuable to social scientists, policymakers, urban planners and others is held by private companies that release only portions of it to researchers. Facebook, Amazon, Google, email providers and ratings companies all know certain things about you and about society, in other words, but there’s no way to aggregate that data to draw global insights.

“Many of the questions that are of interest to social science really require us being able to join these different modes of data and to see who are your friends what are they thinking and what does that mean about what you end up doing,” Watts said. “You cannot answer these questions in any but the most limited way with the data that’s currently assembled.”

Second, even if social scientists were able to draw on that aggregated data, it would raise significant privacy concerns among the public.

“This is a very sensitive point because, to some extent, this is what the NSA has been reputedly doing, joining together different sorts of data,” Watts said. “And you can understand how sensitive people are about that. Precisely the reason why this is scientifically interesting is also the reason why it’s so sensitive from a privacy perspective.”

Finally, because much of the data that’s useful to social scientists was gathered for other purposes, there’s often some bias in the data itself, Watts said.

“When you go to Facebook, you’re not seeing some kind of unfiltered representation of what your friends are interested in,” he said. “What you’re seeing is what Facebook’s news ranking algorithm thinks that you'll find interesting. So when you click on something and the social scientist sees you do that and makes some inference about what you’re sharing and why, it’s hopelessly confounded.”

(Image via Laborant/

Threatwatch Alert

Stolen credentials

Hackers Steal $31M from Russian Central Bank

See threatwatch report


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Data-Centric Security vs. Database-Level Security

    Database-level encryption had its origins in the 1990s and early 2000s in response to very basic risks which largely revolved around the theft of servers, backup tapes and other physical-layer assets. As noted in Verizon’s 2014, Data Breach Investigations Report (DBIR)1, threats today are far more advanced and dangerous.

  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

  • PIV- I And Multifactor Authentication: The Best Defense for Federal Government Contractors

    This white paper explores NIST SP 800-171 and why compliance is critical to federal government contractors, especially those that work with the Department of Defense, as well as how leveraging PIV-I credentialing with multifactor authentication can be used as a defense against cyberattacks

  • Toward A More Innovative Government

    This research study aims to understand how state and local leaders regard their agency’s innovation efforts and what they are doing to overcome the challenges they face in successfully implementing these efforts.

  • From Volume to Value: UK’s NHS Digital Provides U.S. Healthcare Agencies A Roadmap For Value-Based Payment Models

    The U.S. healthcare industry is rapidly moving away from traditional fee-for-service models and towards value-based purchasing that reimburses physicians for quality of care in place of frequency of care.

  • GBC Flash Poll: Is Your Agency Safe?

    Federal leaders weigh in on the state of information security


When you download a report, your information may be shared with the underwriters of that document.