Cities hold the largest share of government data in the U.S., covering everything from liquor licenses to teacher performance reviews, but only a handful of cities have released that data to outside researchers and app developers.
The federal government is trying to facilitate open data releases at the city level with numerous initiatives, including a model open data site, the Open Government Platform, and city, state and county data communities on the government website Data.gov.
These open data sites can help city governments by putting the wisdom of crowds to work on tough questions about how to allocate city resources. They can also aid citizens by providing developers with the raw material for mobile apps that tell them when the next bus will arrive or which blocks might offer open parking spots.
Many cities are hesitant to release data, though, often because they fear being held legally liable if, say, they accidentally reveal which local families are receiving food stamps or if bad mapping data causes a car accident.
In September, Nextgov sat down with Data.gov chief Jeanne Holm, who works out of NASA’s Jet Propulsion Laboratory in Los Angeles, to talk about what challenges cities face with open data, the data that app developers are looking for and how the federal government can help.
The transcript below is edited for length and clarity.
Where does Data.gov stand now on integrating data from state and local governments?
Between cities, counties and states we have about an additional 16,000 or 17,000 data sets on Data.gov. So that’s a significant increase in what we can provide to developers. It’s also really giving citizens something they care about. Half the time when I talk to people they say ‘but what data do you have about my neighborhood?’
How are you reaching out to state and local governments to get them involved?
We’re working with groups including NASCIO [the National Association of State Chief Information Officers], the National Governors Association, the National League of Cities and smaller groups like MuniGov 2.0. The federal government doesn’t want to tell cities ‘you should have an open data site; this is how you should spend your money,’ so the idea is to reach out to these folks who’ve already been having conversations about open data.
We also make standards at the federal level for metadata [information about the data sets origin and quality] and other things. We don’t push that to cities in any kind of mandated way, but it’s a national standard that they can follow if they want.
The other area we have not yet tackled but is coming is taxonomy. It’s sometimes tough for people looking for, say transportation data, to know ‘do I call this a freeway, a highway, a road, a township byway?’ One suggestion I made to the National League of Cities is that we work together with city CIOs and tech teams and hackathon teams to put that taxonomy together so we can just map it over what cities already have.
So if the city wants to use a name that has local significance they can, but when we federate that data on Data.gov we have a standard lingo. So we can say ‘New Orleans calls these things levies, but we call them dams or water retention resources or whatever.’
What kind of data are cities putting out?
Permitting and construction is really big. Parking enforcement and towing services are important for bigger cities. San Francisco and Chicago are going gangbusters on apps around parking. Towtext is new in Chicago. It’s an app where you can register your license plate and if your car gets towed they’ll text you.
We’re also seeing an interesting release of data around green issues, such as the efficiency of the city fleet, police cars and other vehicles, and of city buildings. We’re seeing health and education data to a certain extent. Some schools release school standings and teacher ratings. Another thing that is very active is data about city services like pothole filling, street sweeping, snow plowing and trash collection, just day to day stuff.
What cities are doing particularly well with open data?
It’s a lot of the places you’d expect: New York, Seattle, San Francisco, Chicago and Baltimore are doing huge amounts of pushing data out. Seattle’s very focused on green stuff; New York is very focused on city services.
Occasionally a city will get involved with open data because there’s some opportunity they see. Albuquerque [N.M.], for instance, is looking at how to get developers together to look at issues around wildfires because they have a lot of wildfire data and it’s an important issue all over the Southwest.
How are security concerns different at the federal and local level?
At the federal level, we’ve worked very hard to put processes in place over the years and we have a pretty good track record. Agencies are pretty appraised of the mosaic effect [the concept that data sets that don’t reveal identifying information about citizens on their own may do so when combined with other data sets] . So we anonymize things and we’re pretty learned in that at the federal level.
At the city level, and particularly if you combine city and county data, I think it becomes more problematic. For many cities this is a brand new effort and they’re concerned about what’s the liability if they release this data and how do they look for problems.
The number one question I get when I reach out to cities that haven’t done their open data policies yet is either something about the mosaic effect or it’s about the liability they might face. For instance, if they release information that someone later uses in a traffic app and someone gets into a car accident because the app gave them bad information they want to know if they’ll be liable for that.
In Europe, civil servants are held liable for that kind of data release. Here, unless you’re doing something fraudulent, if the release was done in good faith, people aren’t held liable.
What I generally tell cities is to anonymize the heck out of their data on initial release. You can always reveal more information later and, once you release it, you can start to see what people are doing with it.
Have the questions cities ask changed since the NSA revelations? Are they more concerned about the appearance of collecting and distributing data?
I haven’t noticed any change. What I’ve seen over time, and I don’t know if its related to NSA at all, is that initially cities were asking technical questions: How do I do this? Give me the cookbook. Even cities that are just starting now aren’t really asking that question. They’ve become more sophisticated . Now they’re asking questions about policy.
It’s interesting that in other parts of the world policies usually come before open data sites but here in America we’re so ‘can do’ that open data sites often come in advance of policies.
Has that question of liability led to differences between the U.S. and Europe?
Here, even though people are focusing on policy now, I still see an acceleration of cities releasing data. Maybe each city is only releasing a few new data sets, but the number of cities releasing data is increasing.
In Europe there was this big push forward and there’s been an effort by the European Union to put some common standards in place, but now I see a slowdown of cities releasing data. It’s usually because department personnel have become hesitant. I can’t get a specific triggering story -- I don't know if there was somebody who got put in jail or fined -- but there’s been a change over the last year of more hesitation.
That said, they all understand the economic story of how open data can save them money, so I think there’s a bit of schizophrenia. They want to release data but individual civil servants are worried about what that means for them and that’s legitimate.
How is open data progressing here in LA?
I’m so glad my hometown is finally on the open data map. Just as mayor Garcetti was coming on board they brought out the open data site. We just started a Code for America brigade. We had our first official city hackathon in August and there’s a whole group of developer efforts growing up around the city
The other thing that’s unique about LA is we have the entertainment sector here. There are a huge amount of LA-based folks who either have a voice because they’re celebrities or income because they’re celebrities and who love this city.
[The hip hop artist] Will.i.am was a big backer of the initial hackathon we had in August. He grew up in Boyle Heights, a very underserved, underprivileged neighborhood here in downtown LA, so he put up $5,000 cash as well as a bunch of his staff to come and help kids and developers do things specifically for that community.
One award winner was a group of Boyle Heights middle school and high school kids who said ‘part of our cultural heritage is we’re art lovers but our art is graffiti.’ Some graffiti is just graffiti and some graffiti is beautiful. So -- this is an app you’d never see in Washington -- they created this crowdsourced way to take a picture from your phone and say ‘I think this is really cool and should be preserved.’ The city of LA is very culturally sensitive to graffiti preservation , so if a submission gets enough votes it gets into the city preservation standard.
Are other entertainers involved in open data?
There are a lot of opportunities, I think. There’s an unwritten rule in Hollywood that when you get to a certain celebrity status you need to be doing some public service too. So I think there’s a real opportunity for celebrities to connect in this space for their cause. There are great data sets around issues celebrities often come to, like the environment, animal rights and gender equality.
(Image via Flickr user Sebastiaan ter Burg)