Agencies should publish lists of all their data, group says

Thinkstock

Sunlight Foundation is developing a best practices guide for government transparency.

This story has been updated.

Federal agencies should be required to maintain lists of their full data catalogs so agency officials and the public know what exists, what’s public and what’s not, Sunlight Foundation Policy Director John Wonderlich said Tuesday.

That’s one line item in a list of best practices the nonprofit transparency group is collecting that government policymakers at the state, local and national levels should consider including in their open data policies.

The document also reccomends governments publish data in open, machine-readable formats, such as XML files, and allow other organizations to stream that data through application programming interfaces, or APIs.

Sunlight’s National Policy Manager Laurenellen McCann described the list in a blog post Friday “more as a ‘living document’ than as model legislation -- a menu of options for what can be contained within an open data policy.”

The list includes “sample provisional language” and example cases and was based on conversations with open data practitioners in the private and public sectors, McCann said. Wonderlich stressed that the list isn’t a checklist and that many good open data policies could include only a few of the line items.

The 33-item list also suggests removing “arbitrary technical restrictions” to accessing government data “such as registration requirements, access fees and usage limitations” and removing limits on the use of government information.

Federal Chief Information Officer Steven VanRoekel’s digital strategy, released in May, serves as the government’s open data policy document. It hits many of the Sunlight list’s main priorities.

The document’s greatest failing, in addition to not requiring comprehensive data lists, Wonderlich said, is that it’s not specific enough about what newly collected data should be made public.

The plan’s core open data initiative involves making APIs the “new default” to present raw government data to the public.

VanRoekel’s office will issue governmentwide policies for streaming data from APIs by August and will require all new federal IT systems to follow those policies by May 2013, according to the strategy. The General Services Administration also must update its data set trove Data.gov to include a catalog of government APIs by that date, the strategy said.

The strategy also directs agencies to identify two high-value datasets to make more accessible online by September, but is vague about how those determinations should be made.

“They say basically ‘transparency is the new default’ and that’s something we’ve heard repeatedly with new transparency policies,” Wonderlich said. “The question for me is how are they going to make transparency the new default and that depends on how they craft guidance and policies. With Steven VanRoekel and [Chief Technology Officer] Todd Park on board they have a good chance of making something very strong. But, on its own, this doesn’t achieve that goal.”

A major goal of the open data portion of the strategy is to help private sector developers leverage government data to build money-making and time-saving products similar to the large markets for weather and Global Positioning System data.