Putting open data to use

Posting spreadsheets to Data.gov is no longer enough. Agencies are increasingly using open data — their own and from other sources — in pursuit of core missions.

data door

On Jan. 21, 2009, President Barack Obama's very first executive memorandum declared a commitment to "transparency and open government." Those principles led to the Digital Government Strategy, an open-data initiative and Data.gov — a repository that launched in May 2009 with 47 datasets and now offers more than 400,000.

As the sheer volume of data suggests, agencies have focused mainly on being producers of open data. In December 2009, the Office of Management and Budget gave each agency 45 days to publish "at least three high-value [datasets] not previously available," and many have gone on to share exponentially more. As of May 21, the Transportation Department had posted 1,953 datasets. The Census Bureau has uploaded some 239,000 geodata resources, and the Executive Office of the President has added 150 batches of data to the mix.

What makes data open?

The phrase "open data" carries clear connotations, but what does it officially mean? Click the link below for some defining characteristics.

The emphasis on sharing was not accidental: The president, U.S. CIO Steven VanRoekel and U.S. Chief Technology Officer Todd Park have all touted the commercial and civic value of opening agencies' data vaults. Park and VanRoekel wrote in May that "hundreds of companies and nonprofits have used these data to develop new products and services that are helping millions of Americans and creating jobs of the future in the process."

Dataphiles, however, have noted that more could be done. "We're trying to move beyond the days when...agencies were just asked to make their data accessible, put it on Data.gov and call it a day," Xavier Hughes, the Labor Department's chief innovation officer, told attendees at ACT-IAC's Management of Change conference in May. "Data doesn't have its own two feet and run around saying, 'Use me, use me!'"

Why it matters

Agencies have long created data for their own use, of course. The National Oceanic and Atmospheric Administration, for example, generates 1.7 billion observations a day from its various satellites. Yet just as Web-based businesses from Amazon to Zillow unlock new value by "mashing up" disparate datasets, agencies can gain valuable new insights by breaking down the silos and putting others' data to use.

Commerce Department CIO Simon Szykman pointed to a data fusion project at his department as a case in point. Commerce's bureaus include Census and NOAA. By combining those two agencies' data on topography, tides, floodplains, weather and population, he said, it is possible to better predict the dangers posed by severe weather events.

"That kind of information can be used by first responders," Szykman said. "It can be used by [the Federal Emergency Management Agency] to pre-position supplies.... It can be used for a lot of things."

Stephen Buckner, director of the Census Bureau's Center for New Media and Promotions, said FEMA is already using Census' socioeconomic housing data to identify high-risk areas during disaster planning.

And the National Broadband Map, produced by the National Telecommunications and Information Administration and the Federal Communications Commission, goes even further. Launched in 2011 to show broadband speeds and availability down to the county level, the map draws on the open data of those agencies and that of the OpenStreetMap Foundation and, via a commercial data provider, Census Bureau demographic estimates.

Such examples are only the beginning, said Gwynne Kostin, director of the General Services Administration's Digital Services Innovation Center. As agencies get better blends of data and the tools for sharing them, "it's going to be huge," she said. "I'm not going to speculate on what they are, other than I think they're going to be amazing."

The fundamentals

On May 9, Obama issued an executive order declaring "open and machine-readable data" as the new default for agencies and citing government efficiency as a goal. The OMB memo accompanying the order goes further, stating that "whether or not particular information can be made public, agencies can apply this framework to all information resources to promote efficiency and produce value."

The OMB memo also clarifies just what counts as open data (see "What makes data open?"), which can help agencies assess data sources in addition to improving their own.

At a fundamental level, however, open data is not terribly different from the highly siloed datasets of old. To put information to good use, it must be properly and consistently formatted, accurate and reliable, and relevant to the mission in question. The real change is one of mind-set, meaning decision-makers need to learn to look to others for data that could push a project forward.

"It's actually very hard to find examples" of agencies incorporating others' open data, "although it isn't really hard to find the opportunities," said Seabourne CEO Mike Reich, who has worked with the FCC and Commerce on data-management projects.

The hurdles

The challenges go beyond old habits and parochial attitudes. "So much of what the agencies have is really interesting information" but it has personally identifiable information, Reich said. "Obviously, you can't release PII to the public."

So even if open doesn't mean public, as VanRoekel stressed in a May 15 interview with FCW, agencies must still navigate important privacy concerns as they seek to use other departments' data — and must coordinate with one another directly rather than simply plugging into public application programming interfaces (APIs) or downloading from Data.gov.

The new open-data policy, in fact, explicitly warns of the "mosaic effect" in data aggregation and stresses the need for agencies to ensure that datasets that seem scrubbed of PII will not compromise privacy when combined with other data.

"You now have a privacy protection responsibility as the recipient" of the data, said Michael Howell, deputy program manager for the Information Sharing Environment. "The mosaic effect is actually a sort of inherited risk."

Another challenge involves the underlying data-management tools. David McClure, associate administrator of GSA's Office of Citizen Services and Innovative Technologies, said that although the push for more APIs is helping, "so much valuable data within agencies is still locked up in business analytics systems."

"We're not there yet," McClure said. "We're still enabling the databases. But I think it'll enrich the internal information in the agencies, at least as much as the [outside] developer community."

Jeanne Holm, GSA's Data.gov evangelist, agreed. It will move "a little bit slowly over the next few months," she said, but as agencies begin to "implement things like data catalogs and e-guides, we'll start to see an explosion."