Data Deluge

Agencies have turned to virtualization to process a growing stockpile of information. Now the challenge is where to put it all.

The federal government is awash in data. And it's expanding at rates faster than chief information officers can count. No one knows exactly how much information agencies have stored in their far-flung databases, but experts say it's a lot. Consider this: By 2015, the world will generate the equivalent of almost 93 million Libraries of Congress--in just one year, according to Cisco's Internet Business Solutions Group.

The government is a big player in that information explosion, although how big is not certain. The cost to store and manage the growing mound of data is rising and eating up scarce information technology resources. It's no surprise the next big IT investment agencies will make in the coming years, if they haven't already started, is in something called virtualized storage, which uses software to connect multiple devices to create what simulates a single pool of storage capacity that can be controlled from a central console. The console makes it easier to back up, archive and retrieve data.

With agencies creating more data, storage virtualization is an inevitable part of their IT future. Many operations--from the Congressional Budget Office to the State Department to the U.S. military--are looking for ways to squeeze more efficiency out of their storage systems and drive down costs.

The Census Bureau is looking to virtualize storage to help it manage the 2.5 petabytes of data that ebbs and flows as it conducts the decennial census and vast economic surveys. The data, which amounts to more than the entire collection in all U.S. academic research libraries, is contained in a variety of storage platforms that multiple vendors supply. But maintaining so many disparate systems is driving up the cost of operating the data centers that house the information.

"We have a very diverse storage architecture, and that diversity doesn't lend itself nicely to be highly efficient from a cost perspective," says Brian McGrath, CIO at the Census Bureau. He says virtualization would create storage platforms that could be shared throughout the bureau to minimize unused capacity and lower operating costs.

Five years ago, the first wave of data center efficiency began with server virtualization. Agencies were able to consolidate 10 or more servers into one, increasing use of available computing power from about 30 percent to as much as 80 percent. But that placed demands on storage and backup systems, which require a lot of server capacity.

"Backing up a virtual server infrastructure becomes a big burden on data centers and their resources," says

Fadi Albatal, vice president of marketing with FalconStor Software. "When server utilization rates were 20 percent, servers still had 80 percent available for heavy load processing such as backups. Now that servers have utilization rates of

80 percent it means there's only 20 percent left for all my backup processes."

The Next Big Wave

Now agencies are turning to virtualization not only to process their data but also to store it, driving down the cost of purchasing and maintaining storage devices. The savings potential is substantial. Storage accounts for about half of what an agency spends on hardware. And less equipment means less power usage, which can drop by as much as 50 percent with storage virtualization, Albatal says. Other savings come from freeing up IT employees to work on other data center projects. The shift to storage virtualization is picking up momentum because it coincides with several Obama administration IT initiatives.

The Federal Data Center Consolidation Initiative that federal CIO Vivek Kundra outlined in February requires agencies to come up with plans to combine the government's 1,100 data centers. The goal is to reduce energy consumption and operating costs by making better use of hardware.

The approach is key to cloud computing, another Kundra initiative. By virtualizing servers and storage systems, agencies create shared data center platforms that can host applications and provide Web-based services similar to cloud computing models.

"The opportunity is there to take a holistic view of the entire enterprise, with servers, storage and backups, and to create an agile and responsive data center that will enable private and public cloud computing in the near future," says Michael Voss, lead associate with Booz Allen Hamilton on federal data center consulting projects.

'Teeny' Agency, Tons of Data

The Congressional Budget Office was driven to storage virtualization for cost and power savings, which are critical for the 250-person agency located in the aging Ford House Office Building on Capitol Hill. The agency provides lawmakers with myriad economic reports, analysis and statistics that inform federal budget decisions.

"We're a teeny agency, and we're dealing with huge amounts of data. This is not data that we're generating. This is data that we're taking in to analyze. It's a process that's beyond our control," says CIO Jim Johnson. "I want to spend as little on storing that data as I can and still have it readily available."

CBO has 15 virtualized servers containing anywhere from 3 to 5 terabytes of data, depending on their workload. One terabyte is the equivalent of all the X-ray films in a large high-tech hospital. The agency has been able to buy the storage it needs incrementally, and it has been able to reduce downtime because it can get backup servers and storage devices up and running faster.

"For our primary storage--that's the corporate storage that we live and breathe on every day--we're going to buy high-end storage solutions. . . . But for the replica, for the mirror copies that we keep . . . that we hope we'll never have to use, we use lower-end storage," Johnson says. "This allowed us to spend our funds more appropriately based on our requirements."

Johnson estimates CBO spent about $300,000 on its virtualized storage platform during the past four years. But, he says, the agency saved money by not having to maintain identical storage platforms for primary and backup copies of data.

"Storage virtualization makes it easier for me, particularly as a smaller agency with a smaller budget, to be able to manage my storage requirements . . . and not be held hostage by a single vendor," Johnson says.

Capacity on Demand

The State Department is a leader in data center consolidation, deciding back in 2002 to start reining in the computer rooms and servers that were popping up throughout its facilities. The department is shifting to server and storage virtualization at its three data centers--on the East Coast, West Coast, and one operated by a commercial vendor.

"We have 3,000 systems, and about 21 percent of them are virtualized," says Ray Brow, division chief for enterprise data center consolidation at State. "Our goal is to get to 90 percent. We think that's doable, while 100 percent may not be."

Next up for State is modernizing the systems it uses to store more than 10 petabytes of data, including e-mail, files and electronic forms that used to be filled out and stored on paper. "At the Department of State, we have online all the visas that have been granted since 1992. It's all stored on disk," Brow says.

The department has migrated from tape to disk for backups, with only one mainframe application left to transition, because tapes are easily lost and take longer to replicate. "We were buying tapes and more tapes," Brow says. "We were able to justify moving entirely to Data Domain disk backup systems just on the cost of new tapes, new tape drives and maintenance costs alone."

According to Brow, the biggest benefit of virtualization is centralizing storage management. The department's storage area network software and disk arrays can provide mirror images of systems and snapshots of stored information for faster data recovery. "The snapshots are very efficient in that they are only keeping track of changed data. . . . When someone needs a file restored, most of the users can figure out how to get back to the snapshot," he says.

The biggest challenge for State is managing the growth of its storage requirements and their associated costs. Brow estimates storage and backup represent 35 percent of the department's IT purchases. State already limits the size of employees' mailboxes and home directories, and maintains only six months of backups.

State expects to reap additional savings through a process known as thin provisioning, which allocates available server capacity as needed rather than committing blocks of storage space upfront.

"Our [storage area network] growth is pretty badly out of control," Brow says, adding he hopes thin provisioning will improve the situation. "We're trying to project our storage needs so that thin provisioning and acquisition go hand in hand."

Weeding Out Duplication

When the Defense Department merges two of its Washington area hospitals--Walter Reed Army Medical Center and the National Naval Medical Center--the new facility in Bethesda, Md., will consolidate IT as well. Due to open in September 2011, the National Military Medical Center will feature a new 5,000-square-foot data center with the latest in storage virtualization technology.

"We need 24-by-7 availability and reliability," says Lt. Cmdr. Cayetano "Tony" Thornton, CIO at the National Naval Medical Center. "We are the president's hospital. In addition to that, our primary customers are the men and women who support the nation. Once these guys leave the battlefield, they land at Andrews Air Force Base and they come over to Bethesda. We need a robust backbone and a robust data store to allow our providers to give them seamless health care."

Data center efficiency is key to the merger, says Thornton, who notes the services had duplication across the board.

"Walter Reed had over 400 applications and systems, and now we've brought that down to around 150," Thornton says. "Within each clinical area, we are looking at what the Army, Air Force and Navy are using, and we're choosing the best systems and applications so that we can deliver the best health care possible.

"We are becoming more and more dependent on storage because of all the different scans that we do--CAT scans, PET scans, MRIs--that we're looking to store in one central repository," he says.

The Defense Department's electronic medical records system is demanding an ever-increasing amount of storage capacity, according to Thornton. Walter Reed has more than 100 terabytes of storage, while the Naval Medical Center has about 40 terabytes.

"Our health care providers are looking to scan and to digitize all of that information and to make it available across the military health care enterprise in the National Capital Region," Thornton says. "That means we need a very robust data store and servers."

The new data center also will feature deduplication software that will automatically remove multiple copies of data, freeing up storage space and speeding backup processes.

According to Thornton, the center aims to reduce duplication by about 50 percent. "You really can't get rid of medical data from a historic perspective. It's very complex, and it's not an IT decision. It has to be a clinical health care decision," he says. "What we try to do is manage the data we have as efficiently as possible."

Defense has spent about $10 million for the hardware and software for the new data center, with storage components accounting for 30 percent to 40 percent of the cost. All the servers and storage devices will be virtualized for better data management, accessibility and reliability.

Just Getting Started

From census and spending tallies to passport and health care records, storage virtualization has unleashed the potential to free up computer, energy and staff resources for agency missions.

Virtualized storage also promises to improve the speed at which agencies are able to respond to disasters, from hurricanes and floods to terrorist attacks. "Once we virtualize the storage infrastructure, we can do continuous backups or replicate to another location and have a standby," says FalconStor's Albatal.

Agencies have been adopting storage virtualization for several years, but have a long way to go before they have optimized their data center storage environments. "We probably haven't virtualized 10 percent or 20 percent of federal storage platforms," says Mark Weber, vice president and general manager of the U.S. public sector business for NetApp. "The market is still in front of us."

Carolyn Duffy Marsan is a high-tech businessreporter based in Indianapolis who has covered the federal IT market since 1987.