recommended reading

Why the Modern-Day Government Should Focus More on Big Data Curation

Lucia Fox/Shutterstock.com

Hari Donthi is the vice president of data development systems at NCI, Inc. and leads numerous big data and agile efforts in civilian agencies.

Data management is not a new concept in government IT, nor is the discussion about how to improve IT and business engagement through better use of data. Government agencies always have recognized the importance of leveraging their data. However, today’s government data users (usually referred to in IT circles as “the business”) believe their internal IT shop cannot give them the data they know is available within their agency or that exists in other open data platforms.

How Did We Get Here?

To understand this mismatch of expectations, it helps to look at how we got here. In the '90s, the primary goal of the data warehousing movement was to meet the organization’s needs by solving the single-version-of-the-truth problem.

This required careful reconciliation of data interpretations between different users and departments so everyone could be on the same page. Additionally, stringent data quality checks existed so decision-makers would have confidence in the data.

Because massively parallel processing solutions like Hadoop and column-oriented data stores or the cloud were not commonplace in the '90s — data models had to be designed, tuned and maintained by experts for good performance.

These factors created a barrier to getting new types of data into the data warehouse, and often led to expensive, multiyear programs that — in the end — had very limited utility.

Today, the need for single-version of enterprise-level data is no longer the primary objective of storing historical data. Users want full access to all data and the ability to interact with it to be able to extract insights and rapidly unlock the power of the data.

To achieve this, the focus of government’s data management efforts needs to shift from warehousing to data curation.

Moving Beyond the Warehouse

In our current age of big data, a single enterprise interpretation of the data is passé. The old data warehouse days focused on an enterprise data model created with fixed meanings for data attributes. The users of the data warehouse simply filtered the data based on their department’s needs.

Today, with the proven usefulness of predictive analytics in the private sector and the same growing in government, we must revisit the tradition of an enterprise data model.

Specifically, we should accept that the usage patterns, predictive power and meaning of the data attributes can evolve as an organization gets more mature in mining its data — deploying predictive models into the field and feeding back performance results to refine the models — and as events outside the organization affect its priorities. It is important to separate the data from how it is used.

The Data Curation Difference

Data curation differs from traditional data warehousing. A curated data store is a platform for data users — it does not tell the users how to consume or interpret the data. The data users make the data actionable and meaningful using statistical learning techniques, for example, to predict emerging trends like fraud, noncompliance and virus outbreaks.

The significance and meaning of data attributes are determined by the predictive power of the multiple models that use this data, and these “meanings” can be fed back into the curated data store so it can be a shared enterprise asset.

This process relieves a central authority (aka data steward) from having to be the sole arbiter or the bottleneck of curated data, which is very different from the traditional data warehousing lifecycle of the '90s.

Government can learn from these data warehousing experiences and issues from the '90s, including the role technology played. Back then, it was difficult introducing new data into the data warehouse and getting large databases to perform well for ad hoc analytics.

While technologies of today reduce the need for finely tuned data models, we cannot simply throw away data modeling and create a data lake. As Michael Stonebraker put it eloquently, a data lake can quickly turn into a data swamp. And this is why data curation is necessary and important.

Transitioning from data warehousing to curation also involves a change in user behavior. When curated data is presented to the users, a lot more is expected of them than simply filtering canned reports.

Data curation boils down to serving up the data on a platter. That is, the users know what the data elements mean, where they come from, how to explore and mine them, and how to make the insights actionable. Giving users this power and freedom of ad hoc exploration requires a different engagement model between the users and the maintainers of the curated data platform.

Both parties will need new skills. IT needs to build expertise making data available in a user-friendly way — expertise that is significantly different from delivering user-friendly applications and websites. Users need to acquire skills in interacting with data in a more modern way. Users need a lot more than standard “tool training.” IT and the users need to experience the power of the modern data mining and data exploration tools together, in the setting of their agency’s data.

Doing this will give IT the confidence to step back from creating fully spec’d silo applications to creating data platforms, and users, in turn, will reduce their appetite for expensive use-case specific applications.

This change in the frame-up of conversation between business and IT is the only way predictive analytics will become democratized and help empower government to meet its challenges more rapidly and more efficiently.

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats

JOIN THE DISCUSSION

Close [ x ] More from Nextgov
 
 

Thank you for subscribing to newsletters from Nextgov.com.
We think these reports might interest you:

  • It’s Time for the Federal Government to Embrace Wireless and Mobility

    The United States has turned a corner on the adoption of mobile phones, tablets and other smart devices, outpacing traditional desktop and laptop sales by a wide margin. This issue brief discusses the state of wireless and mobility in federal government and outlines why now is the time to embrace these technologies in government.

    Download
  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

    Download
  • A New Security Architecture for Federal Networks

    Federal government networks are under constant attack, and the number of those attacks is increasing. This issue brief discusses today's threats and a new model for the future.

    Download
  • Going Agile:Revolutionizing Federal Digital Services Delivery

    Here’s one indication that times have changed: Harriet Tubman is going to be the next face of the twenty dollar bill. Another sign of change? The way in which the federal government arrived at that decision.

    Download
  • Software-Defined Networking

    So many demands are being placed on federal information technology networks, which must handle vast amounts of data, accommodate voice and video, and cope with a multitude of highly connected devices while keeping government information secure from cyber threats. This issue brief discusses the state of SDN in the federal government and the path forward.

    Download
  • The New IP: Moving Government Agencies Toward the Network of The Future

    Federal IT managers are looking to modernize legacy network infrastructures that are taxed by growing demands from mobile devices, video, vast amounts of data, and more. This issue brief discusses the federal government network landscape, as well as market, financial force drivers for network modernization.

    Download

When you download a report, your information may be shared with the underwriters of that document.