Spy Research Agency Is Building Psychic Machines to Predict Hacks

The Jaguar supercomputer at a Department of Energy lab in Oak Ridge, Tenn.

The Jaguar supercomputer at a Department of Energy lab in Oak Ridge, Tenn. Oak Ridge National Lab/AP

Using publicly available Internet data, supercomputer-like systems will estimate when a prowler might try to breach a system.

Imagine if IBM's Watson -- the "Jeopardy!" champion supercomputer -- could answer not only trivia questions and forecast the weather, but also predict data breaches days before they occur. 

That is the ambitious, long-term goal of a contest being held by the U.S. intelligence community. 

Academics and industry scientists are teaming up to build software that can analyze publicly available data and a specific organization's network activity to find patterns suggesting the likelihood of an imminent hack.

The dream of the future: A White House supercomputer spitting out forecasts on the probability that, say, China will try to intercept situation room video that day, or that Russia will eavesdrop on Secretary of State John Kerry's phone conversations with German Chancellor Angela Merkel. 

IBM has even expressed interest in the "Cyber-attack Automated Unconventional Sensor Environment," or CAUSE, project. Big Blue officials presented a basic approach at a Jan. 21 proposers' day.

Aims to Get to Root of Cyberattacks

CAUSE is the brainchild of the Office for Anticipating Surprise under the director of national intelligence. A “Broad Agency Agreement” -- competition terms and conditions -- is expected to be issued any day now, contest hopefuls say. 

Current plans call for a four-year race to develop a totally new way of detecting cyber incidents -- hours to weeks earlier than intrusion-detection systems, according to the Intelligence Advanced Research Projects Activity. 

IARPA program manager Rob Rahmer points to the hacks at Sony and health insurance provider Anthem as evidence that traditional methods of identifying "indicators" of a hacker afoot have not effectively enabled defenders to get ahead of threats.

This is "an industry that has invested heavily in analyzing the effects or the symptoms of cyberattacks instead of analyzing and mitigating the -- cause -- of cyberattacks," Rahmer, who is running CAUSE, told Nextgov in an interview. "Instead of reporting relevant events that happen today or in previous days, decision makers will benefit from knowing what is likely to happen tomorrow."

The project’s cyber-psychic bots will estimate when an intruder might attempt to break into a system or install malicious code. Forecasts also will report when a hacker might flood a network with bogus traffic that freezes operations – a so-called Denial-of-Service attack.

Such computer-driven predictions have worked for anticipating the spread of Ebola, other disease outbreaks and political uprisings. But few researchers have used such technology for cyberattack forecasts.

At Least 150 People Interested -- No Word Yet on Size of the Prize Pot

About 150 would-be participants from the private sector and academia showed up for the January informational workshop. Rahmer was tight-lipped about the size of the prize pot, which will be announced later this year. Teams will have to meet various minigoals to pass on to the next round of competition, such as picking data feeds, creating probability formulas and forecasting cyberattacks across multiple organizations. 

At the end, "What you are most likely to be able to do is say to a client, 'Given the state of the world and given the asset you’re trying to protect or that you care about, here are the [events] you might want to worry about the most,'" David Burke, an aspiring participant and research lead for machine learning at computer science research firm Galois, said in an interview. "Instead of having to pay attention to every single bulletin that comes across your desk about possible zero days," or previously unknown vulnerabilities, it would be wonderful if some machine said, "These are the highest likelihood threats."

His research focus is "advanced persistent threats," involving well-resource, well-coordinated hackers who conduct reconnaissance on a system, find a security weakness, wriggle in and invisibly traverse the network.

"Imagine that CAUSE was all about the real-world analogy of figuring out whether some local teenagers are going to knock over a 7-Eleven. That would be really hard to predict. You probably couldn’t even tie that to any larger goal. But in the case of APTs -- absolutely" you can, Burke said in an interview. "The fact that APTS are on networks for a long period of time gives you not only the sociopolitical pieces of data or clues but you have all sorts of clues on your network that you can integrate."

It's not an exact science. There will be false alarms. And the human brain must provide some support after the machines do their thing.

"The goal is not to replace human analysts but to assist in making sense of the massive amount of information available and while it would be ideal to always find the needle in a haystack, CAUSE seeks to significantly reduce the size of the haystack for an analysts," Rahmer said. 

Unclassified Program Will Trawl for Clues on Social Media

Fortunately or unfortunately, depending on one's stance on surveillance, National Security Agency intercepts will not be provided to participants. 

"Currently, CAUSE is planned to be an unclassified program," Rahmer said. "We’re going to ask performers to be creative in identifying these new signals and data sources that can be used."

Participants will be judged on their speed in identifying the future victim, the method of attack, time of future incident and location of the attacker, according to IARPA. 

Clues might be found on Twitter, Facebook and other social media, as well as online discussions, news feeds, Web searches and many other online platforms. Unconventional sources tapped could include black market storefronts that peddle malware and hacker group-behavior models. AI will do all this work, not people. Machines will try to infer motivations and intentions. Then mathematical formulas, or algorithms, will parse these streams of data to generate likely hits. 

One research thread Burke is pursuing examines the "nature of deception and counterdeception, particularly as it applies to the cyber domain," according to an abstract of his proposers' day presentation.

"Cyber adversaries rely on deceptive attack techniques, and understanding patterns of deception enables accurate predictions and proactive counterdeceptive responses," the abstract stated. 

It's anticipated that supercomputer-like systems will be needed for this kind of analysis. 

For example, "if you were able to look at every single Facebook post and you processed everything and ran it through some filter, through the conversations and the little day-to-day things people do, you could actually start to see larger patterns and you could imagine that is a ton of data," Burke said. "You would need some sort of big data technology that you’d have to bring to bear to be able to digest all that."

Still Nailing Down Specifics on Supercomputer Use

The final rules will indicate whether companies can or must use a supercomputer, and whether they can borrow federal computing assets, Rahmer said. "We definitely want innovation and creativity from the offerers," he added. 

Researchers at Battelle, a technology development organization, said they might harness fast data processing engines like Hadoop and Apache Spark. They added that the rules and their team partners will ultimately dictate the system used to amp up computing power.

"We have already recognized as both the rate of collection and the connections between data points grow we will need to move to a high-performance computing environment," Battelle’s CyberInnovations technical director Ernest Hampson said in an email. "For the CAUSE program, the data from several contractors could push us towards the need for a supercomputing infrastructure using technologies such as IBM’s Watson to support deep learning,” or, hardware such as a Cray Urika "could provide the power to fuel advanced analytics at-scale.”

According to IBM's January briefing, the apparatus currently used to solve similar prediction problems "runs on x-86 infrastructure." However, IBM's x-86 supercomputer hardware was spun off to Chinese firm Lenovo last year. It remains to be seen what machine IBM might deploy, a company spokesman said. 

"In theory, the government could say they are going to own the servers," IBM spokesman Michael B. Rowinski said. "We don't know ultimately that we would participate or what we even would propose."

Recorded Future, a six-year-old CIA-backed firm, already knows how to generate hacker behavior models by assimilating public information sources, like Internet traffic, social networks and news reports. But the company's analyses do not factor in network activity inside a targeted organization, because such data typically is confidential.

"Doing this successfully is not simply the sociopolitical analysis applied to current flashpoints," Burke said. "You also have observables on a network: signs possibly of malware or penetration because many campaigns that take place go on for weeks or months. So you also have a lot of network data that you are going to end up crunching."

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.