Why the Modern-Day Government Should Focus More on Big Data Curation

Lucia Fox/Shutterstock.com

The public sector's data management efforts needs to shift away from warehousing.

Hari Donthi is the vice president of data development systems at NCI, Inc. and leads numerous big data and agile efforts in civilian agencies.

Data management is not a new concept in government IT, nor is the discussion about how to improve IT and business engagement through better use of data. Government agencies always have recognized the importance of leveraging their data. However, today’s government data users (usually referred to in IT circles as “the business”) believe their internal IT shop cannot give them the data they know is available within their agency or that exists in other open data platforms.

How Did We Get Here?

To understand this mismatch of expectations, it helps to look at how we got here. In the '90s, the primary goal of the data warehousing movement was to meet the organization’s needs by solving the single-version-of-the-truth problem.

This required careful reconciliation of data interpretations between different users and departments so everyone could be on the same page. Additionally, stringent data quality checks existed so decision-makers would have confidence in the data.

Because massively parallel processing solutions like Hadoop and column-oriented data stores or the cloud were not commonplace in the '90s — data models had to be designed, tuned and maintained by experts for good performance.

These factors created a barrier to getting new types of data into the data warehouse, and often led to expensive, multiyear programs that — in the end — had very limited utility.

Today, the need for single-version of enterprise-level data is no longer the primary objective of storing historical data. Users want full access to all data and the ability to interact with it to be able to extract insights and rapidly unlock the power of the data.

To achieve this, the focus of government’s data management efforts needs to shift from warehousing to data curation.

Moving Beyond the Warehouse

In our current age of big data, a single enterprise interpretation of the data is passé. The old data warehouse days focused on an enterprise data model created with fixed meanings for data attributes. The users of the data warehouse simply filtered the data based on their department’s needs.

Today, with the proven usefulness of predictive analytics in the private sector and the same growing in government, we must revisit the tradition of an enterprise data model.

Specifically, we should accept that the usage patterns, predictive power and meaning of the data attributes can evolve as an organization gets more mature in mining its data — deploying predictive models into the field and feeding back performance results to refine the models — and as events outside the organization affect its priorities. It is important to separate the data from how it is used.

The Data Curation Difference

Data curation differs from traditional data warehousing. A curated data store is a platform for data users — it does not tell the users how to consume or interpret the data. The data users make the data actionable and meaningful using statistical learning techniques, for example, to predict emerging trends like fraud, noncompliance and virus outbreaks.

The significance and meaning of data attributes are determined by the predictive power of the multiple models that use this data, and these “meanings” can be fed back into the curated data store so it can be a shared enterprise asset.

This process relieves a central authority (aka data steward) from having to be the sole arbiter or the bottleneck of curated data, which is very different from the traditional data warehousing lifecycle of the '90s.

Government can learn from these data warehousing experiences and issues from the '90s, including the role technology played. Back then, it was difficult introducing new data into the data warehouse and getting large databases to perform well for ad hoc analytics.

While technologies of today reduce the need for finely tuned data models, we cannot simply throw away data modeling and create a data lake. As Michael Stonebraker put it eloquently, a data lake can quickly turn into a data swamp. And this is why data curation is necessary and important.

Transitioning from data warehousing to curation also involves a change in user behavior. When curated data is presented to the users, a lot more is expected of them than simply filtering canned reports.

Data curation boils down to serving up the data on a platter. That is, the users know what the data elements mean, where they come from, how to explore and mine them, and how to make the insights actionable. Giving users this power and freedom of ad hoc exploration requires a different engagement model between the users and the maintainers of the curated data platform.

Both parties will need new skills. IT needs to build expertise making data available in a user-friendly way — expertise that is significantly different from delivering user-friendly applications and websites. Users need to acquire skills in interacting with data in a more modern way. Users need a lot more than standard “tool training.” IT and the users need to experience the power of the modern data mining and data exploration tools together, in the setting of their agency’s data.

Doing this will give IT the confidence to step back from creating fully spec’d silo applications to creating data platforms, and users, in turn, will reduce their appetite for expensive use-case specific applications.

This change in the frame-up of conversation between business and IT is the only way predictive analytics will become democratized and help empower government to meet its challenges more rapidly and more efficiently.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.