Grid and bear it

Although proven in academia and research, grid computing struggles to find a place in the enterprise.

At first glance, the SURAgrid Coastal Ocean Observing and Prediction program is a perfect example of how grid computing can apply massive amounts of computing power to large problems. But the amount of work that goes into SCOOP, run by the National Oceanic and Atmospheric Administration and the Office of Naval Research, also shows how far grid computing needs to go to make it into the enterprise.

“Grid computing is still pretty much an art and not a science,” said Gary Crane, director of IT initiatives for Southeastern Universities Research Association, a collection of 27 academic institutions that banded together to pool their computing resources into making a grid network, which they call SURAgrid.

The idea behind SCOOP is to help the government better understand storm surge, or the movement of ocean water whipped up by storm winds, as it can cause flooding as well as damage to seacraft. Researchers have designed a set of models to simulate winds and wave movements, as well as how hurricanes and other storms move across the globe.

The problem of realistically simulating such storms is far beyond the abilities of the average researcher’s computer. So NOAA and ONR turned to SURAgrid.

In many ways, SURAgrid is a model for the kind of power that grid computing can deliver. IBM high-performance computing systems at Louisiana State University, Georgia State University and Texas A&M University supply the computational heft. The grid uses Globus Toolkit to bridge the systems, which are connected through the high-speed National LambdaRail initiative.

“We’re creating a regional aggregation of computing resources. Everybody who contributes cycles to this pool has access to this total pool,” Crane said.

The good news is that, thanks to the Globus Toolkit and other grid software applications, SCOOP programs can work on a wide range of systems. The bad news is that it’s just plain hard.

“There is a lot of manual setup right now,” Crane said. To run a SCOOP application, researchers have to come up with a list of software and hardware requirements, and send them to the participating universities. When a match is found, the researchers have to configure the program in order to run on that particular set of machines.

“It is still in its infancy. Automated job submission and resource allocation across a set of machines is still a difficult thing to do,” Crane said.

Five years ago, grid was one of those highly hyped technologies, one that would allow organizations to make more use of their computers. And while the research community has put grid to work, its use outside the labs remains comparatively rare.

What is grid?

In 1998, researchers Ian Foster and Carl Kesselman first formulated the grid concept in the book The Grid: Blueprint for a New Computing Infrastructure (Morgan Kaufmann). They saw grid as a way to share computational resources across a wide range of hardware and software. Normally, software is written for one operating system, which in itself is dedicated to one piece of hardware.

Foster and Kesselman proposed an abstraction layer that could run across all systems in a way in which an application could not only use any one piece of hardware, but any one of a number of different systems at the same time. In effect, a wide variety of systems could be unified as a single virtual system, using software based on a set of open standards. “In a sense we want the grid infrastructure to act like the Internet infrastructure,” Foster said at this year’s LinuxWorld Conference in Boston.

Yoking many different types of machines together is a demanding task and, not surprisingly, the job involves a lot of middleware. To tackle this job, Foster and others created the Globus Toolkit, which handles most of the tasks described above. It is maintained by the Globus Alliance, a partnership of Argonne National Laboratory and a number of academic institutions.

Government researchers took an early shine to the technology, as the simulations, calculations and experiments they wished to run were rapidly outstripping the computers at their own facilities. The National Science Foundation invested $53 million in 2001 into building a grid network called the Distributed Terascale Facility, or TeraGrid.

“We view the DTF as the beginning of the 21st-century infrastructure for scientific computing,” said Dan Reed, then head of the National Center for Supercomputing Applications in Urbana-Champaign, Ill., which helped oversee TeraGrid.

Today, TeraGrid ties together eight supercomputing facilities, aggregating 102 trillion floating-point operations per second of computing capability and more than 15 petabytes of storage.

But while Grid computing has indeed found success in the scientific community, it thus far has failed to make little headway outside academic circles, at least in government agencies.

Agencies certainly could use the extra computational power that Grid could provide. But agencies feeling that crunch, such as the Defense Information Systems Agency, often look to other technologies.

“Since 2002, DISA’s revenue for server processing has grown between 30 and 50 percent annually,” said Alfred Rivera, DISA’s chief of the Center for Computing Services. Lately, the agency has looked to on-demand computing, rather than grid computing, to keep costs under control. Last month, it awarded contracts, potentially worth $700 million, to a number of companies for on-demand processing capabilities. Although companies such as Hewlett-Packard Co. will put plenty of servers in DISA’s data centers, the Server Processor Environment contract stipulates that vendors are only paid for the servers that DISA actually uses.

Observers give a number of reasons for grid computing’s relative failure to catch on, including the lack of applications and difficulty of getting the grid supporting software to run.

Last June, the 451 Group released a report showing that only a very small number of applications have been written for grid computing. The software providers had disclosed that they saw not much demand for the grid-enabled applications.

Getting applications to work on a grid can be hard, however. Interactive Supercomputing of Waltham, Mass., makes the Star-P software, which can take work done on commercial software packages, such as Matlab, and spread them out across dozens or even hundreds of servers. The key to making this work is to run the software in relatively homogeneous environments—say, a rack of servers all running identical operating systems.

Although company researchers are investigating the possible use of grid computing middleware, any products are likely a few years off. “The complexities of programming grid is really hard. There is a very small audience that knows how to deal with it very well,” Gibson said.

Even those applications that can be grid-enabled may not be deployed, because of licensing restrictions, according to Steve Wallage, director of research for the 451 Group. Many applications are licensed on a per-processor basis, so running them across 1,000 processors would be prohibitively expensive, Wallage noted.

Another point of contention is the expertise needed to get grid toolkits up and running. Globus still has limited support in many areas. Wallage spoke to members of one organization that had to write 70,000 lines of code to grid-enable an in-house program. Researchers tend to be well-versed in system administration and programming, so they can work at getting the grid supporting software running. But most IT shops don’t have this level of expertise.

“There is no turnkey solution, and there won’t be for a while, given the heterogeneous requirements,” Foster admitted at a presentation at LinuxWorld. He estimated that the field as a whole is about 70 percent on the way toward commercialization, though he wouldn’t hazard a time frame for completion. (Foster himself is co-founder of Univa Corp. of Lisle, Ill., which sells a version of Globus, called Univa Globus Enterprise.)

Beyond Globus

Despite these limitations, however, grid is finally making some inroads to the enterprise. Wallage noted that, outside government, other industries have started using grids. The 451 Group has found that the pharmaceutical industry uses grids for drug discovery. Oil companies as well as the financial sector have made use of grid.

In most these cases, they use proprietary middleware tools rather than the Globus Toolkit, Wallage noted. They also keep the grids within their own enterprises, which eliminates a lot of headaches that would incur with multi-party grids, such as security issues.

IBM has been using the grid namesake quite heavily for a set of packages that incorporate grid-like capabilities, though the bundles are limited to a specific single application. The bundles, which are devoted to tasks such as data cleansing and actuary planning, are a combination of hardware, software and services. The software allows the workload to be balanced across a wide range of servers, said Ken King, vice president of IBM Grid Computing. For these packages, however, the company does not use the Globus Toolkit. Instead, it sticks to commercial middleware software, such as Websphere application server and scheduling software from Platform Computing Inc. of Markham, Ontario.

Another way that grid will move further into the enterprise is its increasing partnership with Web services. Version 4 of the Globus Toolkit makes use of many of the standardized protocols from the Organization for the Advancement of Structured Information Standards. This should allow grid developers to leverage the work done in the Web services community and vice versa.

“I see the emergence of rich Web services as the key to enterprise engagement,” said Reed, who is now head of the Renaissance Computing Institute.

Despite the slow start of grid computing in the enterprise, Carl Kesselman is optimistic that his brainchild will take hold. After all, most organizations have many of the needs felt by researchers, from wide-scale federated searching to better utilization of distributed resources.

“I think a lot of the early perception was that grid was for doing science and supercomputing. But if you go back and look at our papers, you’ll see we made the point that grid was not just about science and supercomputing,” he said. “We’ve just begin to scratch the surface of real business transformation,” he said.