Big Data

Millions of Weather Records Pose Big Data Challenge for Scientists

Harvepino/Shutterstock.com

Deep in the dusty catalogs of weather stations and meteorological offices all over the world are hidden treasures. They're easy to miss if you're not looking for them—often taking the form of, well, piles of moldy papers. But on those pieces of paper are hundreds of years of weather records—data that could make climate science far more accurate.

The International Environmental Data Rescue Organization (IEDRO) estimates that there are 100 million paper strip charts—records that list weather conditions—sitting in meteorological storage facilities throughout the world. That’s about 200 million observations unused by scientists, data that could greatly improve their models. Now, a few small groups of scientists are trying digitize these records, but they’re facing all kinds of obstacles. 

Climate scientists often bemoan the lack of historic records. There are the famous data sets: the Vostok ice core drilled in the 1970s that looks back about 400,000 years, the Keeling curve started in 1958, data from satellites that watch sea ice retreat starting around 1979. But these are spot points in specific places that only span a short amount of time. To truly understand climate, researcher need a global records that reaches back hundreds of years. 

Those are the kinds of records that data-rescue organizations like IEDRO are trying to recover. “There’s data tied up in paper records that goes all the way back to the lat 1800s,” says Theodore Allen, a graduate student at the University of Miami and IEDRO volunteer. “So rather than working on observations from 1960 to present, we can work on things from 1880 to present.” With that kind of information, climate scientists can make their models far more reliable. The problem is that nobody wants to spend the time and money it takes to scan and input 100 million pieces of pieces of old, musky, often disorganized paper. “You’ll show up to a place and you need dust masks on for days at a time,” says Allen. “You’re crouched over running through dusty, dirty weather records in a damp room. It’s not very glamorous.” 

Theodore Allen

Different groups have different strategies for so called “data rescue” projects. One group, called the Atmospheric Circulation Reconstruction over the Earth(ACRE) focuses on records in existing archives like the National Meteorological Services across Europe. They go into existing libraries to try and digitize the data that exists among the books. “It’s a bit of a detective effort,” says Rob Allen, the data rescue project manager at ACRE, “you have to be an archaeologist, detective, cartographer and climate scientist all in one. 

The IEDRO teams take a different tack—searching for records in the back rooms at local weather stations all over the world. Instead of having their people do the scanning, IEDRO set up weather stations with their own scanners, and hires local people to do the digitizing. IEDRO focuses on creating local jobs around climate digitization projects, and once the project is complete the scanners and other equipment are donated to the weather station. 

With either approach, the task entails hundreds of hours of scanning and data inputting. So both ACRE and IEDRO have started toying with crowdsourcing the data-input side of things. Once the pages are scanned, they upload them to sites like Old Weather, where volunteers can help the scientists and get "promoted" in a little game they’ve created. 

The scope of this kind of data digitization has implications beyond climate science. An IEDRO project isn’t truly finished until the data is used to inform something like a local weather model, or flood recommendations, or city planning. The ACRE team plugs recovered climate data into current weather models to create pictures of what the global climate was like in, say, 1916.

Despite the clear value in this kind of work, keeping these projects alive has been hard. Everyone who works at IEDRO does so as a volunteer. Getting funding is difficult. For the cost of a single satellite, groups like IEDRO could digitize millions of pages. “There’s a lot more money and funding to produce modern climate products like these advanced climate models, than there is wallowing around in some third world pit of a storage shed and unearthing a bunch of paper records,” says Allen. 

Soon, Allen will launch his own little project called “Data Safari”—a chronicle of his motorcycle trip throughout southern Africa in search of climate records to digitize. He'll spend sixty days traveling 10,000 kilometers searching for scraps of paper that might improve the climate record. It's work he'll do, as usual, on a volunteer basis. "One day," he says, "I would love to have this turn into a job."

(Image via Harvepino/Shutterstock.com)

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats

JOIN THE DISCUSSION

Close [ x ] More from Nextgov
// 7:55 PM ET
X CLOSE Don't show again

Like us on Facebook