Op-Ed: How Do You Improve Federal Operations? Mix Cloud and Big Data Analytics

A new approach to data analysis holds huge potential for solving some of government’s thorniest problems.

Medicare is estimated to lose as much as $60 billion a year to fraud. The criminals are frustratingly hard to find though, given the sophisticated tactics they use to manipulate forms and reports. While clues to their identity are buried deep in oceans of information, a recent analysis of healthcare industry data initially was able to detect only patterns of fraud, not the perpetrators. It took analysis of multiple big data sets -- mixing and matching data from vastly different sources and formats -- to find some answers and trends to watch.

What gave away clues to Medicare theft? Education Department student loan records. Cross checked against other data, this information gave analysts insight into which debt-ridden doctors might have a motive for fraud.

That’s the promise of a powerful and alluring combination: cloud computing and big data analytics. Yet without a fundamental change in the way we access and analyze data, the story will be one of promises unfulfilled.

What’s needed is an approach that truly changes the siloed way data is accessed. And of equal importance, there must be a shift in the role people play to enable deeper analysis and greater creativity. In this new approach, machines do analytics, while people concentrate on analysis. “Big data” exists now, everywhere, but its value is in the ability for big analytics that can yield profound, previously unknown, insights.

The ability to fully gain value from data is often encumbered by and reliant on a rigid framework created years ago that grants individuals limited slices of information, based on a narrow reading of the query, and does not easily connect a data analyst with multiple data sources. A program might locate information on specific elements of patient health care records, as in the health example, because it knows precisely what database holds that type of information. But it cannot consult neighboring databases or data sets with information that is not overtly, even obviously, relevant -- such as a variety of data related to doctors.

Creating a ‘Data Lake’

Most big data analysis today is insufficient to unlock powerful, yet highly elusive, insights. Continuing the example, in the cloud, separate data on patients, doctors or other aspects of the medical environment grow larger and more accessible but ultimately remain in highly structured, fragile datasets, with analysts continuing to comb through the same separate segments of information.

Attacking this problem requires fostering deep, overlapping connections between different data sets.  This goes far beyond just bridging the silos -- an approach that offers only limited improvements. Instead, data -- whether structured, unstructured, batch or streaming -- is co-mingled in a carefully architected but fluid “data lake.” For instance, all medically related data, no matter the content or format, can be pooled together in the lake, with each piece of data tagged with its own security requirements.

As analysts are freed of the constraints of data structures (and their limited, “canned” queries), they can now ask more intuitive, big picture questions of the data or use the data to help ask different, new questions. This, in turn, helps organizations unlock the potential of their data through new insights that stimulate new thinking. Interesting and potentially powerful combinations of data from areas not traditionally linked can yield breakthrough insights and ideas, even leading to dramatic cost savings.

Beyond offering just greater analytic power, the data lake approach creates the ability to evolve quickly as mission requirements change. As new analytic needs are defined, adjustments are made in the review of the data itself, not to the infrastructure -- or silos -- that contain it. The result: an ability to deal with the greater volume, variety, and velocity of an organization’s data, with the ability to adapt analytics against future requirements..

But it’s not just the silo approach to data access that needs to be updated to accomplish this potential. The current human vs. machine division of labor cannot scale to the levels required for truly big data analysis.

Most organizations today still rely on employees to play a lead role in data uploading, processing and analysis. It is a workable approach, until the amount of data grows so large as to make it unsustainable. As the data sets grow, more time must be spent on the rote tasks associated with data collection, storage, and processing, leaving little time for the really high-value work in analysis.

A new approach to unlock the true value of big data, one specifically designed for the new age of data, is a “cloud analytics reference architecture.” A reference architecture approach, which is currently being adapted to the larger business and government communities, is a new way of using technology, machine-based analytics, and human-powered analysis to create competitive and mission advantage.

The concept combines the collective benefits and features of an organization’s infrastructure, the data lake, analytics, and visualization tools.  Such an architectural approach enables machines to do a far greater share of the rote analytics work -- data access, collection. People are freed up to do the work they do best, work that requires analysis and creativity. It creates a more effective, scalable and balanced approach to the work.

Whether the focus is fighting Medicare fraud or finding new solutions to improve health care, the possibilities of big data and the cloud are not pipe dreams. But they will not be fulfilled on their own. A conscious effort and deliberate planning are needed. The goal: a road map for that decision-making, one that shows the importance of a holistic, rather than piecemeal, unfocused approach.

Mark Herman is an executive vice president at Booz Allen Hamilton.