White House launches governmentwide investment in big data

Research funding aims to advance science and technology in medicine, astronomy and other areas.

Pablo Martinez Monsivais/AP

Officials from science agencies across the government on Thursday announced $200 million in new research and development investments related to the mining, processing, storage and use of big data.

The National Science Foundation, for instance , announced a $10 million grant to University of California Berkeley researchers to build new algorithms and tools to sort through petabytes, terabytes, exabytes and zettabytes of data.

The National Institutes of Health announced a plan to put a data set of the human genome project in Amazon's EC2 computer cloud with tools to make the information easily accessible to researchers. The 200 terabytes of data the project currently stores would fill about 16 million file cabinets or 30,000 DVDs, making it difficult to share, NIH Director Francis Collins said at an event announcing the big data projects.

Assistant Defense Secretary for Research and Engineering Zachary Lemnios announced plans during the event to develop predictive and learning tools that can use big data to make "truly autonomous" defense systems that "can learn from experience with very little training and learn the limits of their own knowledge."

The initiative was sparked by a June 2011 report from the President's Council of Advisors on Science and Technology, which found a gap in the private sector's investment in basic research and development for big data.

Other agencies involved in the initiative include the U.S. Geological Survey, Defense Advanced Research Projects Agency and Energy Department.

The term big data is used most frequently by online advertising and marketing algorithms that sort through data from social media and elsewhere to micro-target demographics. Similar tactics also have been used by astronomers, oceanographers and geneticists to find patterns in even larger troves of data.

Advances in computing such as nimble computer clouds and programs that can make multiple computers act as one have made it possible to crunch through some data sets that previously were too large for even machine comprehension and to analyze other data sets more quickly and cheaply.

The cost of sequencing a human genome, for example, has dropped from many millions of dollars when it was first completed in 2003 to just $8,000 today, Collins said.

Astronomers who once observed only the sky now can plow through measurement data to model how some astronomical phenomena occur and how others may have evolved in the past, Johns Hopkins University physics and astronomy professor Alex Szalay said during a panel discussion following the government announcements.

For example, Szalay said, astronomers have used big data to model resonance frequencies the universe emits, similar to a drum when its beat, to examine what the big bang might have looked like.