How Advanced Analytics Can Help Combat Infectious Disease from Spreading


As public health threats evolve, so must our ability to predict and effectively monitor their impacts.

Steve Bennett is principal product marketing consultant of government at SAS. He's the former director of the National Biosurveillance Integration Center within the Homeland Security Department. 

This year’s mosquito season kicked off with the news that Zika virus is a known cause of serious birth defects, including microcephaly, fueling global anxieties that this latest infectious disease outbreak could very well outpace efforts to fight it.

By mid-April, more than 350 cases had been confirmed in the domestic United States and, in Brazil, while the number of confirmed cases was well over 1,000, there are hundreds of thousands of suspected cases.

New revelations about the extent of the neurological impacts of Zika infection as well as the nature of its transmission prompted Centers for Disease Control and Prevention deputy director to describe the virus as “a bit scarier than we initially thought.” As public health threats evolve, so must our ability to predict and effectively monitor their impacts.

Government agencies at the local, state and federal levels collect and store an overwhelming amount of information on everything from ground-level emergency responses and at-risk screenings to research and clinical trial data.

Exploiting this big data “treasure chest” of biosurveillance clues can greatly improve our ability to understand, predict, detect, prevent and respond to a spreading disease.

We now possess the ability to combine, connect and draw correlations between troves of previously unrelated data, on a large scale, so why are traditional barriers keeping this valuable information isolated and underutilized?

The sheer size of our government, an aging communications infrastructure and ingrained behaviors can hinder interagency and even intra-agency information sharing.

But merely improving public sector data integration – creating a “big data bank” of potentially relevant information – while necessary, is only the first step. The data is there, and even when shared, it isn’t being exploited to its fullest potential.  

To draw a comparison, shortly before I started my graduate work in biochemistry, many graduate research projects focused on characterizing one particular protein from one particular gene in an organism, taking upward of four or five years to fully characterize a single protein.

A few years later, however, the elucidation of the entire human genome was largely complete, enabling computational analysis of up to 25,000 human genes and proteins simultaneously. Sitting in front of my workstation, I could evaluate structural similarities between thousands of proteins in an afternoon.

The same explosion of data that facilitates much more comprehensive analysis in shorter time exists in many other domains today as well, to include public health. Traditional approaches for epidemiological surveillance of infectious disease were not designed to scale with the constantly growing and evolving body of information available before and after an infectious disease event – information that varies in complexity and substance, and inevitably includes a whole lot of both meaningful and meaningless data.

Advanced analytics can help create a centralized public health information infrastructure by combining and cleansing information collected from local, federal and international health agencies, as well as the academic research community, pharmaceutical vaccine makers and possibly even “nontraditional” health data sources such as social media.

Imagine all of this information is synthesized and correlated (automatically in many cases), in a system that then delivers the most important correlations and connections in real time, as new information is acquired.

For example, imagine a seemingly isolated case of conjunctivitis, treated in a small, rural hospital. However, around the same time and vicinity, pre-hospital EMS and 911 data indicates unusual symptoms in the regional population compared to what would be expected in the background for the time of year.  

Data begins to reflect increases in symptom complexes statistically anomalous from normal, and a list of geo-tagged Tweets emerges with complaints of fever and other symptoms. Connected with advanced analytics, these seemingly separate signals in separate data sets could be integrated into much more powerful alerting for decision-makers.

In short, we could improve the nature and speed of response by detecting, characterizing and acting on anomalies much earlier than traditional techniques permit today.

Such insights are particularly imperative in global areas that lack advanced public health infrastructure to track, contain and communicate about a disease – often, these are places where emerging infectious diseases make the leap to humans from animal populations, as was the case with SARS in 2003, MERS-coronavirus and Ebola both historically and in the 2014 outbreak.

Advanced analytics can improve human analysts’ ability to provide early warning and situational awareness by leading them down a previously unseen trail of breadcrumbs, spotlighting patterns and anomalies along the way.

These approaches are designed to handle “big data” easily, and can combine old and new, structured and unstructured, and complete and incomplete information to better characterize an infectious disease event.

Properly applied, these tools can draw previously unseen correlations and can help screen out all but the most meaningful insights. They can help us get better at predicting infectious disease events, and detecting their earliest warning signs, helping us to understand which interventions might work best, and enabling us to make faster, better decisions.

While our physically interconnected world may enable the spread of infectious diseases across borders and continents, our digitally connected world might very well be the modern antidote.