The Department of Homeland Security is poised to ditch all records from a controversial network monitoring system called Einstein that are at least three years old, but not for security reasons.
DHS reasons the files -- which include data about traffic to government websites, agency network intrusions and general vulnerabilities -- have no research significance.
But some security experts say, to the contrary, DHS would be deleting a treasure chest of historical threat data. And privacy experts, who wish the metadata wasn’t collected at all, say destroying it could eliminate evidence that the governmentwide surveillance system does not perform as intended.
The National Archives and Records Administration has tentatively approved the disposal plan, pending a public comment period.
According to Homeland Security’s rationale, there is "quickly diminishing value for most of the data collected pursuant to intrusion detection, prevention and analysis." A three-year retention period for reference purposes is sufficient, and "the records have no value beyond that point" but can be kept longer, if needed, appraisers said.
Incident reports, which include records on catastrophic cyber events, must be kept permanently.
The main driver for defining data retention policies, typically, is the cost of storing information indefinitely.
The nonprofit SANS Internet Storm Center, which monitors malicious activity on the public Web, retains observation data for 12 years.
Older intrusion-detection records provide insight into the evolution of threats, said Johannes Ullrich, dean of research at the SANS Technology Institute. Analysts there sometimes need even older data to answer today's research questions.
"When we first started, our data was dominated by bots" -- networks of compromised computers -- "attacking common Windows services," he said. Then, a wider array of services started to come under attack, "and more recently, we do have data about the attack of devices -- Internet of Things -- as well as most recently attacks against big data systems."
Ideally, Homeland Security’s intrusion records would be made available to the public in some form, Ullrich said.
"The Einstein data would likely be a goldmine for researchers, as it documents attacks against very specific networks in a consistent way over a large extent of time," he said.
The records might show, for instance, attackers trying to guess host names, such as “admin.healthcare.gov,” that would give them total control over the Obamacare website, Ullrich said.
Storage costs in a commercial cloud likely would be reasonable, he added, ballparking the figure at $50 a month per terabyte of data.
Is This Another One of Those Coverups?
Some civil liberties advocates back Homeland Security’s move to expunge records that might contain individuals' metadata and communications as soon as possible.
"Einstein is a network monitoring system and a lot of the data likely concerns user activity," said Ginger McCall, director of the Open Government Program at the Electronic Privacy Information Center. "We would typically not want agencies to retain that data."
Yet, scrubbing the data presents an accountability challenge for department auditors.
“As a general matter, getting rid of data about people's activities is a pro-privacy, pro-security step,” said Lee Tien, senior staff attorney with the Electronic Frontier Foundation. But “if the data relates to something they're trying to hide, that's bad.”
It is possible the records could reveal the monitoring tools make mistakes when attempting to spot threats.
“Some of them are very smart and in fact, some of them try to learn and try to make guesses about things,” Tien said. By throwing out three-year-old records, “would you be getting rid of the very data that would allow [the Government Accountability Office] to say, 'Yes, it works fine,' or, 'No, it didn't work, but got better?'”
The root problem is a lack of transparency surrounding Einstein, he said, likening the situation to criticism of the National Security Agency’s secrecy around its signals intelligence sweeps.
“You're setting up this data collection system that tracks people when they are using government websites," Tien said. And you don't necessarily have to have that repository. When the government is capturing that information and holds it in its records, there is always a privacy issue. We want to be able to have evaluated it.”
Rep. Elijah Cummings, D-Md., the ranking Democrat on the House Oversight and Government Reform Committee, intends to review the types of records set to be discarded. It is important for DHS to keep any Einstein records related to a breach, but if the records truly hold no worth, Democratic members do not see a problem with disposing of them, a minority committee staffer said.
DHS officials on Friday declined to comment beyond what was stated in the written rationale.
The public has until Dec. 19 to request a copy of the records retention plan. Comments are due within 30 days of receipt.
Categories of Records Headed to the Trash Folder