recommended reading

Big data meets interested government

Intelligence agencies increasingly are looking beyond the satellite photos and secret reports upon which they've traditionally relied for insight into U.S. adversaries' actions and are turning to data-crunching algorithms that can sift through massive piles of disparate information, such as GPS reports, social media posts and online images, said Amr Awadallah, co-founder and chief technology officer of CloudEra, a vendor that maintains and manages big data systems.

While intelligence agencies are the first government entities to mine big data -- data sets too large to be analyzed by desktop analytical tools -- they're unlikely to be the last, Awadallah said.

Agencies managing Social Security, Medicare and Medicaid for instance, could analyze big data to spot trends in fraud and abuse and the Transportation Department could crunch through satellite images to get a better sense of traffic patterns on interstate highways.

CloudEra's federal customers include the CIA and the National Security Agency. "I can't talk about what those projects are, but you can imagine how much data they have and what type of things they could be doing with it," he said.

The CIA also indirectly invested in CloudEra, through In-Q-Tel, an independent, nonprofit venture capital firm started at the spy agency's request and which describes its mission as delivering useful technology to the agency.

Awadallah spoke with Nextgov on the sidelines of the Government Big Data Forum that vendor Carahsoft Technology sponsored on March 6.

At the root of most big data crunching systems is the open source software Apache Hadoop. Its major innovations are, first,the ability to link together multiple computers and servers, either in a proprietary data center or in a computer cloud, and make them work like one huge computer that can scale up for a major task.

The software's second major innovation is the ability to sort through unstructured data such as all posts under a particular Twitter hash tag or emails containing a particular word or phrase, as well as through more structured data such as spreadsheets.

"The old way of collecting data was to only collect it . . . when a human generates it," Awadallah said, such as by making a purchase or filling out a survey.

"We called that an explicit transaction," he said. "Now we're collecting implicit information. We have all these sensors around humans in mobile devices and satellites taking images and there are Web services collecting information about you all the time nonstop."

The classic example of big data in the private sector is when Google, Facebook or another site mines through a user's search history, network of contacts and profile information to micro-target the advertisements she's most likely to click on.

Big data can be used in other commercial ways, though, that have nothing to do with Web activity.

The company Skybox Imaging, for example, has made a business out of sorting through satellite data to deliver commercial intelligence on demand, according to Awadallah.

"So [for example] you can buy a little stream from them that gives you a measure of how many cars are parked at Home Depot in different locations across the country," he said. "If you're a competitor of Home Depot's or if you're a financial analyst who's trying to predict the quarterly earnings of Home Depot that's very valuable information."

(Image via Pasko Maksim/

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Data-Centric Security vs. Database-Level Security

    Database-level encryption had its origins in the 1990s and early 2000s in response to very basic risks which largely revolved around the theft of servers, backup tapes and other physical-layer assets. As noted in Verizon’s 2014, Data Breach Investigations Report (DBIR)1, threats today are far more advanced and dangerous.

  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

  • PIV- I And Multifactor Authentication: The Best Defense for Federal Government Contractors

    This white paper explores NIST SP 800-171 and why compliance is critical to federal government contractors, especially those that work with the Department of Defense, as well as how leveraging PIV-I credentialing with multifactor authentication can be used as a defense against cyberattacks

  • Toward A More Innovative Government

    This research study aims to understand how state and local leaders regard their agency’s innovation efforts and what they are doing to overcome the challenges they face in successfully implementing these efforts.

  • From Volume to Value: UK’s NHS Digital Provides U.S. Healthcare Agencies A Roadmap For Value-Based Payment Models

    The U.S. healthcare industry is rapidly moving away from traditional fee-for-service models and towards value-based purchasing that reimburses physicians for quality of care in place of frequency of care.

  • GBC Flash Poll: Is Your Agency Safe?

    Federal leaders weigh in on the state of information security


When you download a report, your information may be shared with the underwriters of that document.