recommended reading

Sometimes the Best Big Data Questions Raise the Biggest Privacy Concerns


One useful definition for the unstructured data that underlies most existing and theoretical big data projects is that it was often collected for some purpose other than what the researchers are using it for.

That definition was provided by Chris Barrett, executive director of the Virginia Bioinformatics Institute during a series of presentations before the President’s Council of Advisors on Science and Technology on Thursday focused on the value of data mining for public policy.

Data that was initially collected to measure educational achievement, for instance, could be used to analyze how educational achievement relates to obesity or incarceration rates in a particular community.   

This definition points to the potential of big data analysis as more and more information is gathered online and elsewhere, but it also points to some challenges as outlined by Duncan Watts, a principal researcher at Microsoft’s research division.  

First off, a large portion of the data that might be valuable to social scientists, policymakers, urban planners and others is held by private companies that release only portions of it to researchers. Facebook, Amazon, Google, email providers and ratings companies all know certain things about you and about society, in other words, but there’s no way to aggregate that data to draw global insights.

“Many of the questions that are of interest to social science really require us being able to join these different modes of data and to see who are your friends what are they thinking and what does that mean about what you end up doing,” Watts said. “You cannot answer these questions in any but the most limited way with the data that’s currently assembled.”

Second, even if social scientists were able to draw on that aggregated data, it would raise significant privacy concerns among the public.

“This is a very sensitive point because, to some extent, this is what the NSA has been reputedly doing, joining together different sorts of data,” Watts said. “And you can understand how sensitive people are about that. Precisely the reason why this is scientifically interesting is also the reason why it’s so sensitive from a privacy perspective.”

Finally, because much of the data that’s useful to social scientists was gathered for other purposes, there’s often some bias in the data itself, Watts said.

“When you go to Facebook, you’re not seeing some kind of unfiltered representation of what your friends are interested in,” he said. “What you’re seeing is what Facebook’s news ranking algorithm thinks that you'll find interesting. So when you click on something and the social scientist sees you do that and makes some inference about what you’re sharing and why, it’s hopelessly confounded.”

(Image via Laborant/

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Modernizing IT for Mission Success

    Surveying Federal and Defense Leaders on Priorities and Challenges at the Tactical Edge

  • Communicating Innovation in Federal Government

    Federal Government spending on ‘obsolete technology’ continues to increase. Supporting the twin pillars of improved digital service delivery for citizens on the one hand, and the increasingly optimized and flexible working practices for federal employees on the other, are neither easy nor inexpensive tasks. This whitepaper explores how federal agencies can leverage the value of existing agency technology assets while offering IT leaders the ability to implement the kind of employee productivity, citizen service improvements and security demanded by federal oversight.

  • Effective Ransomware Response

    This whitepaper provides an overview and understanding of ransomware and how to successfully combat it.

  • Forecasting Cloud's Future

    Conversations with Federal, State, and Local Technology Leaders on Cloud-Driven Digital Transformation

  • IT Transformation Trends: Flash Storage as a Strategic IT Asset

    MIT Technology Review: Flash Storage As a Strategic IT Asset For the first time in decades, IT leaders now consider all-flash storage as a strategic IT asset. IT has become a new operating model that enables self-service with high performance, density and resiliency. It also offers the self-service agility of the public cloud combined with the security, performance, and cost-effectiveness of a private cloud. Download this MIT Technology Review paper to learn more about how all-flash storage is transforming the data center.


When you download a report, your information may be shared with the underwriters of that document.