Unlocking operational data is becoming more critical -- and challenging -- as agencies struggle to balance a growing demand for quick information with tight budgets, aging hardware and legacy systems. Agencies are realizing that building a big data capability in-house would consume significant amounts of time and money. While 64 percent of IT professionals surveyed said their agency’s data management system could be easily expanded or upgraded, they estimated it would take an average of 10 months to double their capacity. That doesn’t even include the costs involved in new or upgraded IT infrastructure and training staff to manage and analyze the data.
An alternative approach is to leverage years of private sector investment in research and development. Technology companies can help federal managers analyze their organizations’ data to find opportunities for cost savings, operational efficiencies, informed decision-making and data transparency.
As IT providers address early-stage concerns about storing big data in the cloud --including access controls, privacy and security -- the concept is increasingly attractive to agencies.
The General Services Administration, for example, recently moved the USASpending.gov website to a big data cloud provided by technology services provider GCE. USASpending.gov is a one-stop destination for federal procurement information, such as grants, contracts, loans and payments. As one of the most widely used sites for tracking agencies’ buying habits, USASpending.gov is a vital mechanism for ensuring government transparency. It was a prime candidate for a big data platform since it incorporates analytics, search tools and large data feeds, and serves an extensive user base of citizens, media professionals, lawmakers and resellers.
USASpending.gov’s transition to a big data cloud yields several valuable lessons for agencies looking to analyze and manage data more effectively.
The Right Questions
In the current budget climate, it is unrealistic to address big data needs by investing in massive in-house systems. The cloud has made it easier for agencies to shed weighty IT infrastructure in favor of vendor-provided services.
But there is a learning curve in collaborating with companies on big data projects. Agency managers and contracting officers, in some cases, can look to industry experts on how to construct effective requests for proposals based on what has worked and what hasn’t. At the same time, they should coordinate with subject matter experts inside the agency to determine what information they want from big data tools. Asking the right questions will ensure the product can deliver the right answers.
The Optimal Starting Point
A big data journey begins with identifying the organization’s needs and opportunities. Leaders must determine which areas they can derive the most value from having a new way to analyze data. Once the starting point is identified, the following steps are essential to weaving big data into existing infrastructure:
- Start small. Build a pilot program in the cloud, where a subset of users can test out an analytical tool. Enabling users to see how their data behaves on a certain platform drives better decisions about what data is most valuable.
- Start clean. Determine whether the data is in a condition that is useful. Is cleansing required before transitioning to a big data framework? Data quality is hard to measure in absolutes and should be evaluated in relation to the community using it. Develop a thorough understanding of what people want out of the data, rather than simply building something and force-feeding it to them. Recognize the needs of internal and external audiences. If the big data tool will be used exclusively by an internal audience, an agency can probably get away with a few warts as long as the functionality is there and users see value. If the data will be exposed to external audiences, the bar is far higher.
All Clouds Are Not Created Equal
After deciding to move forward in the big data cloud, IT managers must understand their options and determine which provider aligns with their needs. Agencies might be tempted to exert more control through a proprietary product, but there are significant benefits to open source software like Hadoop.
With open source technologies agencies can avoid being locked into the flavor of the month, which requires painful extraction if leaders choose a different direction. Based on a Hadoop framework, for example, products from various vendors can be easily right-sized and computing capacity can be scaled up and down as needs change. In such an environment, computing power can be five to 10 times faster than mainstream disk and switch technologies.
Agencies should seek out experienced vendors with knowledge of government data and the applicable requirements, attributes and policies. This will ensure agency and contractor officials have a common understanding and speak the same language.
For all the rhetoric surrounding big data, there are real projects delivering significant benefits to agencies. As the federal government grows comfortable leveraging investments, expertise and resources from the private sector, more agencies will be able to weave a big data cloud service into their existing IT infrastructure.
Ray Muslimani is chief executive officer at GCE, which provides cloud and big data services.