How OPM Can Find Its Missing Data on the Dark Web


The best way to recover from breaches is to assume that they’re inevitable — and start looking for your data before you know it’s gone.

The data breach announced last week — more than 4 million federal employee health records stolen from the Office of Personnel Management, or OPM, allegedly by hackers linked to the Chinese government  — was “the most significant” theft of government data ever, according to the chairman of the House Homeland Security Committee. But experts say it’s not too late to reduce the harm done by tracking the stolen information as it moves around the Internet.

Stealing data is different from stealing a Ming vase, in that the original remains behind. If cyber detectives can find a copy of this data “in the wild,” they can limit its value as a tool for fraud, help build a case attributing the hack to the Chinese government, and develop insight into how the data will be used.

How would you do that? First, make sure you can recognize the data when you come across it, using a technique called cryptographic hashing. “It’s not code that’s embedded in the data so much as a computation done on the data itself,” said Danny Rogers, one of the co-founders of Terbium Labs, a data intelligence company that tracks stolen data. By running chunks of the data through a mathematical function, you generate a hash — a number unique to each specific chunk. You can then crawl the web in search of data whose hash values match those of your original.

Last week, Terbium released a product called MatchLight that hunts for stolen data. “We compute a whole bunch of these hash values on little pieces of data, both on behalf of our clients and as we crawl. We simply compare the results of those hash functions to each other to tell which data had a similar input,” he said.

Hashing won’t prevent a breach from happening. The point is to drastically cut down on the amount of time it takes to discover that data has been stolen, by constantly crawling the web in the search for hashes, even before you know it’s gone. Early discovery can make the stolen data worth less to the people who stole it.

“You really can’t prevent every breach. With advanced enough actors, you have to assume that your organization is going to be breached at some point,” Rogers said. “The most important element of your security posture is how quickly you can detect where the data is and respond, initiating whatever remediation plan you have in place.”

The OPM breach was detected in April by Department of Homeland Security, or DHS, experts using a system called Einstein 3, which looks for malware on federal computer networks. The system, designed to predict and prevent major cyber breaches, did neither of those things here. But it was — at least partially — useful in detecting the the breach after the fact.

How long after the fact? DHS hasn’t said. But the average time between a breach and its discovery by the plundered organization is 200 to 230 days, according to Rogers. Moreover, it’s often third-party security firms like Kaspersky Labs or IOActive that make the discovery.

Of course, there’s more than one way to detect major intrusions after they occur. That’s what MatchLight is all about. “We can bring that down to 30 seconds-to-15 minutes,” said Rogers.  He says that MatchLight, though hardly the only product that can do cryptographic hashing, is the only one that can do it on the scale relevant to an organization like OPM. “We focus on the large-scale automation of that process,” Rogers said.

Of course, the data fingerprinting is only useful if the stolen data hits the Dark Web — a portion of the web unreachable through “normal” search engines like Google. Often accessed anonymously through onion routing services like Tor, the Dark Web is often associated with illegal exchanges — but is also used by activists and journalists looking to exchange information beyond the gaze of authoritarian regimes.

Is the arrival of the stolen records on the Dark Web a certain bet? The chief value of much of stolen OPM data could be the narrow targeting of very particular military or national security workers, possibly via blackmail or elaborate phishing scams.

“The background investigation data stolen from OPM is everything anyone would ever need for blackmail,” ACLU chief technologist Chris Soghoian noted on Twitter.

But the idea of narrowly targeting four million people is absurd. There’s a good chance that at least some of the stolen data could wind up for sale in dark corners of the Internet or as part of fraud schemes that have nothing to do with the military rivalry between the United States and China.

“This seems like a breach that was motivated primarily by espionage motivations. But, at the same time, the line between espionage motivations and economic motivations is quite blurry,” said Rogers. “I strongly expect to see elements of this data appearing out in the dark web for fraud activities.”

“There is a big possibility the information will firstly be sold as huge ‘data chunks’ and later will go for cheap to some of the re-sellers and will be sold individually for each record,” Ido Wulkan, the senior analyst at S2T, which develops Dark Web harvesting technologies, toldDefense One. “I would focus my search on Mandarin-speaking Dark-Web forums and card markets, reviewing information from the past few months. The fact that a [Chinese] government-backed group might be behind the breach may indicate that the information was not stolen for financial purposes, and if that is the case, it most probably will not find its way online.”

Major data thefts like the OPM hack will be more frequent as the amount of personal data, and personal health data explodes in the years ahead. In order for data to have value, it must be used. When data is used, it becomes visible, as does the person or individual that used it. Perhaps the best way to drive down losses from data theft is to spend less on fancy prevention systems, since prevention is impossible, and more on systems to track and reveal data once it goes missing.

(Image via scyther5/