Cleaning, cataloguing, counseling – it's all part of the job for agency data gurus as they seek to surface useful data for their colleagues and customers.
"Data by itself is useless," said Ian Kalin, the Commerce Department's chief data officer. "We have to build stuff on top of it."
And getting the data ready for building is where the CDO needs to shine.
Kalin and Consumer Financial Protection Bureau CDO Linda Powell were among the speakers at FCW's Dec. 2 big data event, and both agreed that, in some respects, CDOs need to play custodian, cleaning and cataloguing what agencies already have.
"Why is this stuff so dirty?" Kalin asked, echoing a common complaint about government data. "Did they have to release everything in a PDF file?"
Two months ago, Powell said, her team released an internal catalogue of CFPB data, which she said has proven invaluable to the diverse internal stakeholders who often didn't even know the agency already had the data for which they were looking.
But at the end of the day, Powell and Kalin said, data teams must work with customers to figure out what they actually want.
Powell has gone so far as to develop a "CDO toolkit" to facilitate that fact-finding effort. "You can't optimize it unless you know how people are going to use it," she said, adding that "business lines don't think in terms of data", so data scientists must work closely with customers -- both inside the agency and out -- to figure out what data they truly need and value.
Kalin noted that the Commerce Data Service, a "start-up within government" announced last month, is geared toward just that sort of work. He also recounted ongoing Commerce export support work leveraging existing data sets. Targeting small- and medium-sized businesses that haven't exported before, Commerce teams have been providing U.S. firms with overseas demand information that appears dovetail with those companies' offerings.
Some 90 percent of the firms Commerce has contacted have pursued the leads, Kalin said, and it all stems from using – after cleaning – existing data sets.
"From an internal perspective, it shifts the culture from having a government that is reactive by nature to a service that is proactive by nature," Kalin said of the work. "That's an example of the kind of stuff that should be built that is leveraging existing information -- but again, because the information was uselessly locked up in a PDF file, no one was ever going to find it."
With terabytes upon terabytes of data, the entire U.S. government owes it to the country to seek out ways for people to actually use it, Kalin said. "That's a responsibility of government," he asserted. "Can we take this information and find innovative ways to help Americans competitively?"