recommended reading

Agencies may be unable to meet request for downloadable data

E-government specialists say it is unreasonable for the White House to call on agencies to release existing government data in machine-readable formats as part of an open government directive.

Federal Chief Technology Officer Aneesh Chopra on Wednesday said the imminent directive will include a schedule for the distribution of data in formats that citizens, companies and nonprofits can download, search and manipulate to gain greater insight into government operations.

Providing information in machine-accessible formats would be one step toward increasing transparency in government, which is the overarching purpose of the long-awaited directive first announced in January.

But in today's federal information technology environment, most agency data exists in PDF format, which cannot be easily extracted for analysis.

"There's no data associated with [a PDF]. It's not machine-readable; the only intelligible way to retrieve data from the system, is you need that system," said Kevin Novak, co-chairman of the World Wide Web Consortium (W3C) eGovernment Interest Group and a former director of Web services at the Library of Congress. W3C, a Web standards development organization, was founded by World Wide Web inventor Tim Berners-Lee.

Comments posted on an internal governmentwide discussion Web page set up by the White House asked how older content would fit into Obama's open government principles. The comments, which administration officials subsequently published without names, were part of a March online discussion to solicit ideas from federal employees on how to make government more transparent.

"One question I would throw out there is to what extent should our transparency efforts support legacy data?" an employee wrote in March. "Is it enough to just have 'search everything from 2007 onward,' or do we need to build systems with backwards compatibility (or even reprocess the data to fit it into the structures)? Obviously, in a perfect world it would be everything, but given limited resources how should we be prioritizing?"

President Obama, on his first day in office, told agency heads to compile by May 21 recommendations for a directive that would incorporate new technologies to create a more transparent, collaborative and participatory government. Wednesday's announcement marked the first reference from the White House on the timing and content of the final directive.

There are no executive branch standards for machine-readable data yet, Novak said. While scientific agencies, such as the U.S. Geological Survey and NASA, warehouse their information in machine-readable formats, they are the exceptions, he added.

White House officials said they are familiar with concerns regarding legacy data and are confident they can work collaboratively with agencies to achieve the president's transparency goals.

It would be best to direct agencies to develop standards for releasing data in machine-readable formats, said Novak, now vice president of integrated Web strategy and technology at the American Institute of Architects.

W3C officials, including Novak, recently shared with the White House the organization's notes on putting government data online.

The group also released on Sept. 8 formal steps and standards for publishing government data, which include posting data in raw form in structures that allow computers to manipulate the information and creating an online catalog of the data so people can discover what is available. Agencies also should make sure data contains attributions that humans and machines can understand.

W3C's recommendations could offer a glimpse into part of the directive, but the White House emphasized it is considering a range of ideas.

"Many groups have weighed in with the open government initiative on data standards issues, W3C among them but not exclusively," said Rick Weiss, senior science and technology policy analyst at the Office of Science and Technology Policy. "The Office of the [Chief Information Officer] and the CIO Council will work closely with the data standards community to continue to develop best practices around the release of open data."

Novak noted there is a new PDF version -- PDF/A -- that is more suitable for long-term preservation than the traditional portable document format. A PDF/A file contains the coding necessary to replicate, over time, the visual appearance of the document, including its text, images, fonts and color. The standard prohibits links to outside content and fonts that are not embedded in the file. This renders the document independent of other tools and systems.

"It is all about access, not just for today, but for the future," he said. "Once government begins to place data into the public space, the expectation will be that it becomes a resource center and research center for historical, current and future data. The challenge is what and how government deals with the legacy data, particularly image-based PDFs . . . and what level of effort is put to ensure those items are discoverable and accessible via the Web."

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • It’s Time for the Federal Government to Embrace Wireless and Mobility

    The United States has turned a corner on the adoption of mobile phones, tablets and other smart devices, outpacing traditional desktop and laptop sales by a wide margin. This issue brief discusses the state of wireless and mobility in federal government and outlines why now is the time to embrace these technologies in government.

  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

  • A New Security Architecture for Federal Networks

    Federal government networks are under constant attack, and the number of those attacks is increasing. This issue brief discusses today's threats and a new model for the future.

  • Going Agile:Revolutionizing Federal Digital Services Delivery

    Here’s one indication that times have changed: Harriet Tubman is going to be the next face of the twenty dollar bill. Another sign of change? The way in which the federal government arrived at that decision.

  • Software-Defined Networking

    So many demands are being placed on federal information technology networks, which must handle vast amounts of data, accommodate voice and video, and cope with a multitude of highly connected devices while keeping government information secure from cyber threats. This issue brief discusses the state of SDN in the federal government and the path forward.

  • The New IP: Moving Government Agencies Toward the Network of The Future

    Federal IT managers are looking to modernize legacy network infrastructures that are taxed by growing demands from mobile devices, video, vast amounts of data, and more. This issue brief discusses the federal government network landscape, as well as market, financial force drivers for network modernization.


When you download a report, your information may be shared with the underwriters of that document.