Big Data Pitfall -- Too Much Organizing, Not Enough Analyzing


Through data virtualization, federal agencies can handle – and benefit from – all their data in a way that is efficient and secure.

Anne Buff is a business solutions manager at SAS.

Big data is a business reality and a hot topic in government. According to IDC, the U.S. will have created, replicated and consumed 6.6 zettabytes of data by 2020.

The largest producer of information in the world, the federal government is tapping an exponential volume and variety of data sources, producing a whopping 848 petabytes of data in 2009.

Big data presents unprecedented opportunities in the federal space. With the opportunities come daunting implications, especially data management and integration challenges. The amount of time agencies spend locating and integrating relevant data to perform specific process tasks and produce organizational insights grows daily. This is because data is distributed across many different application data stores built to support specific functional business areas making it extremely difficult to access integrated, relevant data for cross-functional use.

Data Virtualization – What, Why and How

The inability to easily access a comprehensive view of your organization’s data not only reduces its value but also marginalizes the capabilities and possibilities your data could help you achieve. By the time a report reaches the hands of decision-makers, the data is old, if not obsolete.

Throughout every day, agencies make critical decisions regarding our nation’s economy, health care, education, national security, military operations and more. Only by sharing integrated data across an organization can we provide the right people with the right data at the right time. 

Data virtualization – the process of providing access to a comprehensive, consistent view of enterprise data by means of abstraction and integration of data from disparate sources – is the answer.

Combining data from multiple, disparate sources across an organization in a unified, logically virtualized manner gives agencies appropriate control without physically moving data. More specifically, integrating disparate data enables agencies to:

  • Provide data for query and reporting
  • Simplify access to data in multiple operational systems
  • Combine on-premises and cloud data
  • Integrate structured, unstructured and semi-structured data
  • Create consistent data services across an organization
  • Introduce agility into a data warehouse/BI environment
  • Provide on-demand integration of distributed master data
  • Enable heterogeneous replication

Bringing together meaningful data from across heterogeneous sources isn’t easy. It is a daily struggle just to figure out where all the data really lives. The truth is, data teams spend too much time preparing data to be analyzed, not actually analyzing it.

The good news: Data virtualization can have a significant impact in the following key areas:

  1. Shared, up-to-date information. Integrating information for reporting is practically impossible when manually created reports from across the organization are formatted in many different ways. Often, they’re outdated and inaccurate. Efforts are frequently duplicated across the organization, wasting inordinate amounts of time. Data virtualization can deliver integrated data on demand from any application, portal or tool, offering high ROI and maximum agility. Built-in optimization capabilities, including query caching and push down query execution, increase the speed of data access and query processing allowing existing business solutions to run in a more optimized environment and providing the ability to generate new business opportunities faster.
  2. Collaborative culture. Compounding the problem of integration itself, the frustration with the process can undermine trust in the reports. Concerns about data being lost, misrepresented or corrupted cause users throughout the organization to create multiple versions of similar reports, often with varying formulas and conflicting results.
  3. Security and compliance. A virtualized environment provides security and compliance controls through user- and role-based authentication and authorization with low-level security. Data masking of highly sensitive data such as financial, medical or personally identifiable information prevents unauthorized users from accessing or reading data even if it is passed from the data source.

Data-driven Approach

The burgeoning need to tap distributed data among agencies is driving up demand for data virtualization. Traditional data integration cannot sustain the increasingly complex information generated by the federal government.  

A federated approach enables agencies to manage data projects of any size or mission. Virtualizing a federal organization’s data ensures all users and applications are working with the same data consistently, effectively and efficiently. And the improved data quality naturally translates into far greater confidence in the reporting and analyses.

Data virtualization is a proven method of dealing with the complexity and volume of data while providing agencies agility and comprehensive, accurate and secure information. Through data virtualization, federal agencies can handle – and benefit from – all of their data in a way that is efficient and secure.