What Happens When You Can't Share or Access Your Own Data?

Agencies can avoid vendor lock-in with hybrid open source software.

The easy interchange of data is crucial for government agencies. On a regular basis, agencies must share information between components, other agencies, partners, foreign governments and international organizations as a critical element of their mission execution. This strategy enables unity of effort, more rapid and informed decision-making, increased adaptability, improved situational awareness, and greater precision in mission planning and execution.

This is the idea of data democratization, the ability for information in a digital format to be accessible to the average end-user. The goal of data democratization is to allow non-specialists to be able to gather and analyze data without requiring outside help or intervention.

Agencies have historically enabled data sharing by relying on locked third-party solutions or by building their own and adding helper applications that serve to eliminate the differences in data formats that exist within or between agencies. These solutions have the appearance of fast value because they are easily deployed and managed.

But, as recently realized, challenges arise when agencies want to move their data from a third-party system to their own or to another third party. Some agencies have found that when they dispose of the system they cannot access all of their own data, or at least the analysis of some data, from third parties, leaving them with incomplete data sets and limited means to use the data to make decisions.

This is a frustrating problem, but what is important to understand is that it shouldn’t be a problem in the first place. There are simple solutions available, but agency IT departments are sometimes unaware of these alternative solutions and how they work. A good point of comparison is the use of Search; most Search users don't fully understand the complexity of the algorithms involved in simple Search tasks, nor should they be expected to. The solution is a simple user experience backed by complex inferencing algorithms that the user doesn’t need - nor should be expected - to understand.

For example, Schema on read—when data is applied to the plan, or schema, as it is pulled out of a stored location—can help agencies address their frequent data sharing needs across branches for everything from human resources and payroll information to more mission-focused data. Schema on read behaves completely differently from traditional data hubs based on Schema on write, when a schema is created for the data before writing it into the database. Instead of having to decide what data is important before it is ingested, agencies can make that critical decision when they are querying. This means they don’t lose any data, not even “dirty data.” No longer will agencies have to get rid of data just because it doesn’t match the tool or query they’re using. Rather, they can simply change the tool or the query.

In addition, graph data stores link data together directly and this linked data can be retrieved with one operation by focusing on the relationship between the data, often referred to as “linked data.” Data points are called nodes, and the relationship between one data point and another is called an edge. A web of nodes and edges can be put together into interesting visualizations—a defining characteristic of graph databases. While most people aren’t familiar with graph data stores, they are actually commonly deployed in government agencies, just deep within the stacks.

But there’s more. Agencies can do this themselves for a minimal investment in time and money. In fact, what agencies think are highly proprietary offerings can be achieved on hybrid open source software. HOSS offers the best of both worlds, benefiting from the rapid innovation, open ecosystem and standards, and large global talent pool offered by open source software, as well as the integration, security, compliance and governance offered by proprietary software. HOSS empowers data sharing among government agencies and is honestly a simple solution that should be embraced by agency IT leaders.

Agencies have more than two choices when it comes to data sharing. In fact, decision-makers can avoid vendor lock-in altogether by building their own systems on HOSS. IT teams need not be concerned about the seemingly complex algorithms involved. The partnership provided by HOSS will make the deployment and management seem just as simple as handing it off completely to a third party. But, this time, agencies won’t have to worry about losing critical data should the needs of their mission change over time.

Christine Kerns is a regional director for Cloudera’s National Security Programs.