Big data is not just a job, it's an adventure

Officials from DHS and DOT face different challenges with departmental data but agree that when it comes to managing big data, "you're never done."

Shutterstock image: executive riding a bike up a mountain.

For federal agencies, making data accessible is a bit like parenting: No matter how much you do, there's always more to be done.

During a Dec. 11 AFCEA-sponsored panel discussion in Bethesda, Md., on how to use big data effectively, data officers from several large federal agencies said the amount of data they make available for outside consumption is increasing, while the complexity and the challenges presented by the data shift constantly.

"You're never done" with the job of gathering and facilitating access to agency datasets, said Daniel Morgan, chief data officer at the Transportation Department. "There's always new data."

Federal agencies are handling the tidal wave of data differently, depending on the audience.

Donna Roy, executive director of the Department of Homeland Security's Information Sharing Environment Office, said DHS is implementing four "data lakes" that will take in and store data from DHS components regardless of format.

The approach can reduce upfront costs and make data more widely available within the organization or to outside stakeholders, depending on the data's sensitivity. Roy said the approach could reduce the current heavy "janitorial" workload to clean up data and make it useful across the organization.

Morgan said hundreds of local and state agencies must submit data on a daily basis to DOT from myriad sources, such as crash sites and road sensors, for both public and internal use. The department created an interagency dashboard that shows how states are performing in their data submissions to the agency.

The advent of autonomous vehicles will complicate DOT's job, and deciding how to ingest and support all the data generated by those vehicles is a looming challenge for federal policy-makers.

Roy said the National Information Exchange Model, the Homeland Security Information Network, and Identity, Credential and Access Management are enabling major information integration across more than 20,000 federal, state and local law enforcement agencies, which means DHS is somewhat ahead of the game in making large datasets work in a variety of environments.

The department's newest program, the DHS Data Framework, will also help, she said. The framework is a scalable IT program with built-in capabilities to support advanced data architecture and governance processes. It has built-in privacy protections and enables a more structured use of existing homeland security-related information across the organization.