Observers applaud the redesign to make the site more interactive, but say data needs better context and documentation.
The White House on Friday unveiled a more interactive version of its online catalog of federal statistics in celebration of the first-ever effort to open government operations to analysis. But observers noted the Obama administration still has its work cut out for it.
One year after launching Data.gov, White House officials enhanced the website with user ratings and citizen-created mashups, which are applications that mix government datasets with outside information to provide the public with insights into trends and relationships. Through one mashup, citizens could learn, for instance, that federal Chief Information Officer Vivek Kundra is the 15th most frequent visitor to the White House.
Some academics and management consultants said the information portal now needs to focus on data context and integrity to achieve true transparency.
Cynthia Farina, a law professor at Cornell University who researches Web-based public participation in the regulatory process, cautioned not to underestimate the challenge of directing hundreds of different federal agencies to follow a standard set of protocols for reporting data. But "This has to be the goal if the administration is serious about public value being created by crowdsourcing data," or allowing the public to slice, dice and add to the information to better understand the workings of the federal government, she said.
"It's extremely difficult to get cross-government initiatives up and running," she added. "The legal, institutional and political differences among departments and agencies are substantial, and can be extremely challenging to negotiate."
That said, Data.gov must do a better job of disclosing the methodology agencies and the White House use to collect and process the underlying information, according to Farina. "It's not like they have to reinvent the wheel here," she added. "Academic research has well-established protocols and expectations for how data should be revealed in order to permit others to replicate reported results."
The Office of Management and Budget requires agencies to submit, along with each data set, an online template with a short explanation of the contents, details on the collection method, technical documentation, links to agency programs that contributed, and other data describing the data, or metadata.
"We've tried to be ... open and transparent about the source of the data itself," Kundra said during an interview. The published data often comes directly from the agency information technology systems that generate the original information. If officials were to standardize the raw contents, then the data might no longer represent the government's actual work, he noted.
Anthony M. Cresswell, deputy director of the Center for Technology in Government at the University at Albany-SUNY, called Data.gov "a good thing -- as far as it goes," but said other statistical data sets are needed to fulfill the administration's open government goals.
"Open government should include pulling back the curtain on how important decisions are made," Cresswell said. "Which senators employ holds to stop legislation? How often does each member of Congress meet with lobbyists and which ones? Who participates in drafting and marking up legislation? Writing rules and regs? Where is that on Data.gov? How much have we spent so far on the war in Iraq? How much today?"
A former federal manager said government officials should brush up on their numerical literacy to ensure the information they are reporting is meaningful.
"Context is needed to explain the data," said Phil Landesberg, who spent 32 years as a civil servant in the Navy and now conducts research in the areas of performance management, psychology and statistical analysis. "Context means knowing what the data represents, who measured the data, how it was measured, why it is important and how it will be used."
For example, if the Transportation Department announced it would start counting all cars crossing the replacement for the Minneapolis I-35W bridge, which collapsed into the Mississippi River in 2007, to inspect the safety of the infrastructure, it would first need to define "all cars." The definition would have to specify whether vehicles making round trips counted as one or two cars. If the point of the numerical exercise were to assess the wear and tear on the road surface, then counting the cars twice would be more useful, Landesberg said.
In the federal government, there is "too much concern on counting things and too little thought on what to measure, how it will be useful, who needs to know the results, and what to do with the results," said Landesberg, now president of Miles2Go Seminars and Consulting, which provides personnel and training services to state and local agencies.
Kundra said each agency has a point person responsible for overseeing its data sets and metadata templates. The public also plays a part in supervising data integrity by voting on the usefulness of the information and thoroughness of the metadata on the site itself, he noted.
The White House's future plans for Data.gov include adding crowdsource tagging, or the ability for the online masses, which likely possess additional knowledge about the published data, to add metadata, Kundra said.
For the time being, Data.gov represents something close to nirvana for many Web developers, who enjoy having easy access to data sets they can mash up and make meaningful for the public.
To honor the one-year anniversary of Data.gov, the Sunlight Foundation, a government transparency group that has a programming division, unveiled a national data catalog that gathers information from all three branches of government at the state, local and federal levels. Clay Johnson, director of Sunlight Labs, the developer arm, likened the catalog to "a sort of Dewey decimal system for government data online."