The National Oceanic and Atmospheric Administration’s Big Data Project, created last year to help unleash better economic value of petabytes’ worth of government-collected environmental data, is showing early positive returns.
In one of its first attempts to make a large data set available in the cloud, the agency slashed the workload on its archive systems, freeing up bandwidth and resources on constrained legacy systems, said Amy Gaskins, director of the Big Data Project.
At the same time, the cloud provided customers with faster, on-demand access to NOAA data.
The use case involves the San Francisco-based Climate Corporation, a major consumer of NEXRAD Doppler radar data.
In October, Amazon Web Services, which teamed with NOAA on the project, began hosting some 270 terabytes of archived NEXRAD data dating back to 1991. Rather than request NEXRAD data from the NOAA’s National Center for Environmental Information archive – a process Gaskins said can take weeks – Climate Corporation was able to get on-demand access to real-time NEXRAD feeds updated every 5 minutes. The data came from 160 Doppler stations across the country;
Their models improved, Gaskins said, as did their time to market, by “two to three weeks.” In addition, running models in the cloud also reduces infrastructure technology costs because the company doesn’t have to run models on its own systems.
The stewards of the country’s environmental data also saw a 50 percent reduction in requests for archived NEXRAD data, which is important given the agency’s technological and bandwidth constraints.
“For them, it is about on-demand access, speed to market and reduced infrastructure costs,” Gaskins said in an interview with Nextgov. “For us, it’s about reducing the workload at the archive at NCEI. What we’re able to do through the Big Data Project is the ability to divert that traffic to the cloud. And what we can do at NOAA is better serve our core constituency of researchers, academics and nonprofits.”
There were additional benefits, too. Gaskins said opening up NOAA’s NEXRAD data actually helped it identify and correct a small number of corrected files, essentially improving the “gold copy” of its archived data.
The early use case bodes well for the Big Data Project’s future.
A 3-year partnership between NOAA and AWS, Google Cloud Platform, IBM, Microsoft and the Open Cloud Consortium will help explore which data sets make the most economic sense to host in these very different cloud environments, Gaskins said. And with more than 120 petabytes of environmental data in its archive – the vast majority of which is “on the table” to be shared – the sky may be the limit.
“The feeling from collaborators is quite positive right now,” Gaskins said.