The big question for agencies and open data

GSA's Myehsha Boone gets at the critical question for those trying to decide what data, practically speaking, is worth sharing.

data door

Agencies have made tremendous progress in the past six years on opening an ever-larger collection of government data. What started in some corners as a grudging post of whatever spreadsheet (or, worse, PDF) was readily available, mainly to appease the Obama administration, has grown into a truly rich ecosystem of valuable and usable data.

"Usable" does not automatically mean "used," however. Many agencies still struggle to determine what datasets are truly in demand and which ones effectively go into cold storage on Data.gov. And although "open by default" is an admirable policy, the reality of resource constraints mean that most agencies must triage their open-data efforts – hopefully in a way that moves the most valuable data to the front of the line.

That's why a recent post by Myehsha Boone on Digital.gov was so interesting. Boone, who is a data management coordinator in General Services Administration CIO Sonny Hashmi's office, decided to go beyond page views and download stats to talk with actual, verifiable data users. Her story, titled "Who's Using Your Agency’s Data?" is short, so I've included it here in its entirety (though you should be reading Digital.gov regularly for its great array of how-to content!):

For months, I've been trying to figure out how to get leads for the OMB External Use Open Data Survey responses. I've attended Google Analytics seminars, asked for survey responses from some of our public-facing sites, added a data request form to our data page and begged for leads from program owners. The result was very few leads and no indication of whether or not they were people who actually were looking for our data, used our data or just had a website resource access issue. Then one day, it just dawned on me -- Freedom of Information Act (FOIA) requests!

OK, what really happened was that I got frustrated and began to think about the advantages and benefits to the Open Data Policy and why we should proactively publish data in the first place. Since one of the major benefits is to reduce the number of FOIA requests, then why not start there? The people who request data regularly usually know exactly what they are looking for and probably have some ideas on how we can improve our delivery and processes. Also, since the FOIA office predates "open data," they are the best resource for determining who's using the data and the "top" users of our data.

In preparation, I created an electronic method to collect the survey data provided by OMB. I contacted our FOIA office and asked for leads from regular requestors or repetitive requestors. After gaining internal approvals from legal to send the survey to the public, I sent an invitation via email to the list of FOIA requestors. Tada! (Sing like R-E-S-P-E-C-T) R-E-S-P-O-N-S-E!

Now, to level set, I didn't get truckloads of responses from this method. But the more you send, the more you should receive. This method did help us get very close to meeting the requirement. Also, this is ONE very legitimate and efficient method to find out who's using your agency's data and cutting through the "data browsers" to the "data users." Think about it: Most FOIA requestors have a definite purpose and reason for requesting data and are more than happy to let you know if you hit the mark -- or not.

Boone's experiment reminds me of a similar effort in New York, where a group called Reinvent Albany took a detailed look at state Freedom of Information Law requests to show how agencies can use public demand to decide where to focus their open-data efforts.

These are important steps, and ones that other agencies should be repeating. When time and money are in short supply -- which is to say, always -- figuring out which datasets demand attention is just good government. And improving the signal-to-noise ratio can only lead to more and better use of government data -- which is, after all, the point of opening all this data in the first place.