World War II may seem like an unlikely place to go looking for the origins of data analytics or insights into building predictive cyber intelligence programs, but the lessons of the past can help inform even a digital future. British code breakers who deciphered the encrypted messages of the German Enigma machines during World War II not only made breakthroughs in mathematics, but also in understanding and predicting the behavior of German code clerks. The success of Bletchley Park’s code breakers stemmed in part from their insight into human behavior.
The British cryptanalysts had an advantage that we do not always enjoy today – they knew who their enemy was. They could analyze linguistic and cultural patterns within the encrypted messages – searching for recurring communications such as weather reports, or common phrases such as Heil Hitler, to identify patterns. Predicting cyberattack behavior on a global, 21st century scale is more complex. Networks are under constant bombardment from communications that may have hopped numerous times before arriving at their destinations. Malicious actors are always innovating and morphing. Still, human ‘fingerprints’ are bound to appear within the network data to help us identify them, and hopefully to predict and safeguard against future attacks.
Imagine there is a government agency called SHIELD. Its networks are under attack, and administrators suspect that data is being stolen. Information security analysts start with the agency’s risk profile: What critical information is at risk, who might want it, and what might they do with it?
SHIELD’s critical data and national secrets may be targeted by run-of-the-mill criminal hackers, but they also may be very valuable to foreign governments—valuable enough that some actors might go to surprising lengths to obtain them. The geographies associated with unusual network activity are one piece of the puzzle, and so are tactics. Defacement and DDoS (Distributed Denial of Service) attacks occur more frequently than espionage and theft and are often driven by ideology. On the other hand, the most sensitive and highly-guarded intellectual property is likely to be targeted through a combination of social engineering (phishing or insider attacks) and sophisticated malware.
Initial indicators allow SHIELD’s analysts to track and monitor cycles of attacks over specified timespans, looking for patterns. For example, let’s say that SHIELD was hit by a DDoS attack on an election day. It would be in SHIELD’s interest to follow the Internet and social media buzz leading up to future election days.
SHIELD analysts also monitor fluctuations in the amount of suspicious activity correlating to certain times of the year or specific political events. Depending on the evolving theory of the malicious actor, analysts may start monitoring social media or other news outlets for signs that their theory may be supported by geopolitical evidence.
This sort of approach gives clues to politically or ideologically motivated attacks, but does not address financially-motivated criminal activity or espionage. Moreover, making generalizations around political, geographic or cultural factors could lead to reputational damage and is often misleading. Attackers may position cultural references in malware code as decoys in order to cover their tracks. An organized crime group in one country may be acting on someone else’s behalf. Code written by a state-backed hacker may be copied and repurposed by a novice activist motivated by ideology on the other side of the world. Not to mention, our understanding of others’ worldviews or motives are often distorted.
Ultimately, human context is only one piece of the puzzle. The fingerprints on data may cast some light onto the path in front of us as it did for the code breakers at Bletchley Park. Given the attackers, targets and threat vectors we face today, our conclusions and actions must begin and end with the data itself. Developing a resilient cyber threat intelligence program calls for proactive analysis of human behavior, network traffic and, ultimately, letting the truth in the data take us where it will.