The White House has been promoting a slew of data-related efforts in the last few weeks, but the government is still in the early stages for open data, U.S. Chief Data Scientist DJ Patil said Thursday.
The repository where agencies are encouraged to publish their own data sets, data.gov, should be considered a "beta," Patil said at a Georgetown University event. Patil, who spent many years at the University of Maryland analyzing government weather data, said agencies are starting to make data sets more navigable.
"You don't need to do a ridiculous number of searches...[but] the big challenge is, how do we really iterate on that cycle, where the data is usable?" he sai.
All of the data projects Patil has overseen has "at least one partner where you wouldn't expect," he said. The Precision Medicine Initiative, which aims to invest in technology that could tailor medical treatment to an individual patient's genetic makeup and lifestyle, depends on data from the National Institutes of Health, the Office of the National Coordinator for Health IT, and the National Science Foundation's and Energy Department's computational abilities, Patil said.
» Get the best federal technology news and ideas delivered right to your inbox. Sign up here.
But working with other agencies can slow down the analysis process, William Eggers, executive director of Deloitte's Center for Government Insights, said during the panel. He described a state-level project in Indiana searching for the causes of a high infant mortality rate; that analysis relied on data from about 17 agencies, he said.
"It took like, 15 months to do this project. Why? Because of the memorandums of understanding between agency and agency after agency ... that doesn't become a scalable solution."
Patil described a project in which his team analyzed about 11 million prisoners cycling through about 3,000 jails. Many of those repeat prisoners have mental health problems, and identifying those people and bringing them to care could reduce their recidivism.
"If you move the data from the criminal justice system and you hand it over to the health care system, you find those intervention points," he said.
Moving that data "is not some crazy amazing sophisticated data pipeline. It's called a spreadsheet...You're like, 'Is Sally on your list? Go fish.' ... That's what we're talking about."
It's scaling up that process that requires more advanced technology, Patil said.
"That's why these Data.gov efforts are important, because most of the time it's like, 'Oh, I don't need a [memorandum of understanding,]' it's there" online, in a machine readable format, he said. "We have to have those backstops that prevent some other organization from finding an excuse ... of 'No, that's too hard.'"