Big data success starts with science

Tips: Don't sweat the answers until you've figured out the questions. Don't buy the technology until you've hired the scientists.

adrift in a sea of data

As everybody knows by now, the key benefit of big data is to find the proverbial needle in the haystack — the insight lurking in a vast sea of disparate data that answers a million-dollar question or leads to a huge efficiency boost.

Before the data has value, then, its owners have to know what question they want to answer, said Booz Allen Hamilton Vice President and data scientist Joshua Sullivan. It's a point he has driven home to 15 major commercial and federal clients for which he's overseen the installation of data science teams.

These small teams drive the big data process by being curious and discerning questions that the data could answer, even before a big data platform and infrastructure are set up.

After several years of buzz, most federal agencies are in the midst of creating big data plans, but Sullivan said too few are creating those plans mindful of the questions they need  answered.

Simply developing the IT infrastructure without first asking why will inevitably lead to a major waste of taxpayer dollars, Sullivan said.

"Before an agency buys a data science team or a big data anything, they need to know what value they need to get out of their data. What are the questions you are going to ask?" Sullivan said. "Almost everyone has a big data plan looking at technology, but very few of them say they're actually proficient with analytics. You have to think, what are the analytics you want out of it?"

That feds are still struggling with big data is well documented.

A recent survey by the Government Business Council of 313 executives in 27 federal agencies found that 37 percent of managers reported their agency is taking "appropriate steps" to leverage big data to enhance agency operations. Only 31 percent of those surveyed felt their agency was "fully leveraging all of the data it collects" now. But 70 percent of respondents said big data had the potential to fundamentally transform agency operations.

Perhaps just as troubling is that only four percent of those surveyed reported their agency had hired data scientists, suggesting initial big data investments are on the technology side, not in the people who can make use of what the technology allows.

"Everyone wants to solve this technology problem when really this is an analytics problem," Sullivan said. "You absolutely do need a big data platform; otherwise you don't have work space for a data science team to do processing. Analytics just isn't going to show up."

Big data hurdles exist in infrastructure investment and a fragmented federal IT landscape – there are dozens of big data pilots going on across the federal government right now. 

Sullivan said his data science teams come armed with their own big data platform and reference architecture, thereby negating at least one barrier.

The biggest barrier, though, remains agency personnel who hold on to an old way of IT thinking that worked successfully through decades of upgrades – from individual computers to servers to the data centers we see today. In those days, it was OK to upgrade the technology and worry about the human element later. With big data, that might not be the case.