recommended reading

Agencies may be unable to meet request for downloadable data

E-government specialists say it is unreasonable for the White House to call on agencies to release existing government data in machine-readable formats as part of an open government directive.

Federal Chief Technology Officer Aneesh Chopra on Wednesday said the imminent directive will include a schedule for the distribution of data in formats that citizens, companies and nonprofits can download, search and manipulate to gain greater insight into government operations.

Providing information in machine-accessible formats would be one step toward increasing transparency in government, which is the overarching purpose of the long-awaited directive first announced in January.

But in today's federal information technology environment, most agency data exists in PDF format, which cannot be easily extracted for analysis.

"There's no data associated with [a PDF]. It's not machine-readable; the only intelligible way to retrieve data from the system, is you need that system," said Kevin Novak, co-chairman of the World Wide Web Consortium (W3C) eGovernment Interest Group and a former director of Web services at the Library of Congress. W3C, a Web standards development organization, was founded by World Wide Web inventor Tim Berners-Lee.

Comments posted on an internal governmentwide discussion Web page set up by the White House asked how older content would fit into Obama's open government principles. The comments, which administration officials subsequently published without names, were part of a March online discussion to solicit ideas from federal employees on how to make government more transparent.

"One question I would throw out there is to what extent should our transparency efforts support legacy data?" an employee wrote in March. "Is it enough to just have 'search everything from 2007 onward,' or do we need to build systems with backwards compatibility (or even reprocess the data to fit it into the structures)? Obviously, in a perfect world it would be everything, but given limited resources how should we be prioritizing?"

President Obama, on his first day in office, told agency heads to compile by May 21 recommendations for a directive that would incorporate new technologies to create a more transparent, collaborative and participatory government. Wednesday's announcement marked the first reference from the White House on the timing and content of the final directive.

There are no executive branch standards for machine-readable data yet, Novak said. While scientific agencies, such as the U.S. Geological Survey and NASA, warehouse their information in machine-readable formats, they are the exceptions, he added.

White House officials said they are familiar with concerns regarding legacy data and are confident they can work collaboratively with agencies to achieve the president's transparency goals.

It would be best to direct agencies to develop standards for releasing data in machine-readable formats, said Novak, now vice president of integrated Web strategy and technology at the American Institute of Architects.

W3C officials, including Novak, recently shared with the White House the organization's notes on putting government data online.

The group also released on Sept. 8 formal steps and standards for publishing government data, which include posting data in raw form in structures that allow computers to manipulate the information and creating an online catalog of the data so people can discover what is available. Agencies also should make sure data contains attributions that humans and machines can understand.

W3C's recommendations could offer a glimpse into part of the directive, but the White House emphasized it is considering a range of ideas.

"Many groups have weighed in with the open government initiative on data standards issues, W3C among them but not exclusively," said Rick Weiss, senior science and technology policy analyst at the Office of Science and Technology Policy. "The Office of the [Chief Information Officer] and the CIO Council will work closely with the data standards community to continue to develop best practices around the release of open data."

Novak noted there is a new PDF version -- PDF/A -- that is more suitable for long-term preservation than the traditional portable document format. A PDF/A file contains the coding necessary to replicate, over time, the visual appearance of the document, including its text, images, fonts and color. The standard prohibits links to outside content and fonts that are not embedded in the file. This renders the document independent of other tools and systems.

"It is all about access, not just for today, but for the future," he said. "Once government begins to place data into the public space, the expectation will be that it becomes a resource center and research center for historical, current and future data. The challenge is what and how government deals with the legacy data, particularly image-based PDFs . . . and what level of effort is put to ensure those items are discoverable and accessible via the Web."

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Modernizing IT for Mission Success

    Surveying Federal and Defense Leaders on Priorities and Challenges at the Tactical Edge

  • Communicating Innovation in Federal Government

    Federal Government spending on ‘obsolete technology’ continues to increase. Supporting the twin pillars of improved digital service delivery for citizens on the one hand, and the increasingly optimized and flexible working practices for federal employees on the other, are neither easy nor inexpensive tasks. This whitepaper explores how federal agencies can leverage the value of existing agency technology assets while offering IT leaders the ability to implement the kind of employee productivity, citizen service improvements and security demanded by federal oversight.

  • Effective Ransomware Response

    This whitepaper provides an overview and understanding of ransomware and how to successfully combat it.

  • Forecasting Cloud's Future

    Conversations with Federal, State, and Local Technology Leaders on Cloud-Driven Digital Transformation

  • IT Transformation Trends: Flash Storage as a Strategic IT Asset

    MIT Technology Review: Flash Storage As a Strategic IT Asset For the first time in decades, IT leaders now consider all-flash storage as a strategic IT asset. IT has become a new operating model that enables self-service with high performance, density and resiliency. It also offers the self-service agility of the public cloud combined with the security, performance, and cost-effectiveness of a private cloud. Download this MIT Technology Review paper to learn more about how all-flash storage is transforming the data center.


When you download a report, your information may be shared with the underwriters of that document.