recommended reading

Agencies may be unable to meet request for downloadable data

E-government specialists say it is unreasonable for the White House to call on agencies to release existing government data in machine-readable formats as part of an open government directive.

Federal Chief Technology Officer Aneesh Chopra on Wednesday said the imminent directive will include a schedule for the distribution of data in formats that citizens, companies and nonprofits can download, search and manipulate to gain greater insight into government operations.

Providing information in machine-accessible formats would be one step toward increasing transparency in government, which is the overarching purpose of the long-awaited directive first announced in January.

But in today's federal information technology environment, most agency data exists in PDF format, which cannot be easily extracted for analysis.

"There's no data associated with [a PDF]. It's not machine-readable; the only intelligible way to retrieve data from the system, is you need that system," said Kevin Novak, co-chairman of the World Wide Web Consortium (W3C) eGovernment Interest Group and a former director of Web services at the Library of Congress. W3C, a Web standards development organization, was founded by World Wide Web inventor Tim Berners-Lee.

Comments posted on an internal governmentwide discussion Web page set up by the White House asked how older content would fit into Obama's open government principles. The comments, which administration officials subsequently published without names, were part of a March online discussion to solicit ideas from federal employees on how to make government more transparent.

"One question I would throw out there is to what extent should our transparency efforts support legacy data?" an employee wrote in March. "Is it enough to just have 'search everything from 2007 onward,' or do we need to build systems with backwards compatibility (or even reprocess the data to fit it into the structures)? Obviously, in a perfect world it would be everything, but given limited resources how should we be prioritizing?"

President Obama, on his first day in office, told agency heads to compile by May 21 recommendations for a directive that would incorporate new technologies to create a more transparent, collaborative and participatory government. Wednesday's announcement marked the first reference from the White House on the timing and content of the final directive.

There are no executive branch standards for machine-readable data yet, Novak said. While scientific agencies, such as the U.S. Geological Survey and NASA, warehouse their information in machine-readable formats, they are the exceptions, he added.

White House officials said they are familiar with concerns regarding legacy data and are confident they can work collaboratively with agencies to achieve the president's transparency goals.

It would be best to direct agencies to develop standards for releasing data in machine-readable formats, said Novak, now vice president of integrated Web strategy and technology at the American Institute of Architects.

W3C officials, including Novak, recently shared with the White House the organization's notes on putting government data online.

The group also released on Sept. 8 formal steps and standards for publishing government data, which include posting data in raw form in structures that allow computers to manipulate the information and creating an online catalog of the data so people can discover what is available. Agencies also should make sure data contains attributions that humans and machines can understand.

W3C's recommendations could offer a glimpse into part of the directive, but the White House emphasized it is considering a range of ideas.

"Many groups have weighed in with the open government initiative on data standards issues, W3C among them but not exclusively," said Rick Weiss, senior science and technology policy analyst at the Office of Science and Technology Policy. "The Office of the [Chief Information Officer] and the CIO Council will work closely with the data standards community to continue to develop best practices around the release of open data."

Novak noted there is a new PDF version -- PDF/A -- that is more suitable for long-term preservation than the traditional portable document format. A PDF/A file contains the coding necessary to replicate, over time, the visual appearance of the document, including its text, images, fonts and color. The standard prohibits links to outside content and fonts that are not embedded in the file. This renders the document independent of other tools and systems.

"It is all about access, not just for today, but for the future," he said. "Once government begins to place data into the public space, the expectation will be that it becomes a resource center and research center for historical, current and future data. The challenge is what and how government deals with the legacy data, particularly image-based PDFs . . . and what level of effort is put to ensure those items are discoverable and accessible via the Web."

Threatwatch Alert

Network intrusion / Stolen credentials

85M User Accounts Compromised from Video-sharing Site Dailymotion

See threatwatch report


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Data-Centric Security vs. Database-Level Security

    Database-level encryption had its origins in the 1990s and early 2000s in response to very basic risks which largely revolved around the theft of servers, backup tapes and other physical-layer assets. As noted in Verizon’s 2014, Data Breach Investigations Report (DBIR)1, threats today are far more advanced and dangerous.

  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

  • PIV- I And Multifactor Authentication: The Best Defense for Federal Government Contractors

    This white paper explores NIST SP 800-171 and why compliance is critical to federal government contractors, especially those that work with the Department of Defense, as well as how leveraging PIV-I credentialing with multifactor authentication can be used as a defense against cyberattacks

  • Toward A More Innovative Government

    This research study aims to understand how state and local leaders regard their agency’s innovation efforts and what they are doing to overcome the challenges they face in successfully implementing these efforts.

  • From Volume to Value: UK’s NHS Digital Provides U.S. Healthcare Agencies A Roadmap For Value-Based Payment Models

    The U.S. healthcare industry is rapidly moving away from traditional fee-for-service models and towards value-based purchasing that reimburses physicians for quality of care in place of frequency of care.

  • GBC Flash Poll: Is Your Agency Safe?

    Federal leaders weigh in on the state of information security


When you download a report, your information may be shared with the underwriters of that document.