recommended reading

Wikipedia Is Better Than Google at Tracking Flu Trends

Flickr user peterthoeny

Wikipedia traffic could be used to provide realtime tracking of flu cases, according to a study published today. John Brownstein, a professor of pediatrics at Harvard Medical School and director of Boston Children’s Hospital’s computational epidemiology group, along with follow researcher David McIver, has developed an algorithm for pulling daily flu metrics from data on which flu-related terms are viewed in the online open-source encyclopedia.

Brownstein previously developed Flu Near You, which relies on users to self-report flu-like symptoms in themselves, family, and friends. But by analyzing page views for terms such as “fever,” “influenza,” and “Tamiflu,” for example—Brownstein and McIver created a more reliable method of estimating flu spikes.

Using online activity to monitor flu trends isn’t a new idea. Google Flu Trends has used flu-related search engine queries to estimate the number of daily cases since 2008. But the algorithm failed in 2009, overestimating the peak number of cases during the H1N1 swine flu pandemic. The 2012-2013 flu season saw similar miscalculation.

When compared to data from the Centers for Disease Control and Prevention on the prevalence of flu-like illnesses in the US (which is released to the public with a two-week lag) the Wikipedia model was found to be more accurate than Google’s. As the charts below show, that’s because of its ability to stay on track even during sudden spikes in infection (and the accompanying panic):

Screen Shot 2014-04-17 at 2.58.28 PM

Screen Shot 2014-04-17 at 2.57.49 PM

Perhaps, the authors suggest, hyped pandemics and particularly unpleasant flu strains cause increased Googling—including by those not ill but looking for news stories. The researchers don’t didn’t investigate exactly why those who click through to Wikipedia are more likely suffering from the flu, or near someone who’s suffering. But it stands to reason that the site can give researchers a nuanced read on how we’re feeling: Wikipedia is likely to be among the top results in web searches—and as the No.1 source of health information on the internet, those who click through to the site may be more likely to be seeking information about symptoms or medications.

In the paper, Brownstein and McIver point out that the CDC’s data isn’t perfect, either: It’s reported by physicians, who may be more likely to log flu-like symptoms when they have heard media buzz about a possible pandemic. Indeed, it’s not impossible that web-driven metrics may one day overtake the official data in both speed and accuracy.

(Image via Flickr user peterthoeny)

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Modernizing IT for Mission Success

    Surveying Federal and Defense Leaders on Priorities and Challenges at the Tactical Edge

  • Communicating Innovation in Federal Government

    Federal Government spending on ‘obsolete technology’ continues to increase. Supporting the twin pillars of improved digital service delivery for citizens on the one hand, and the increasingly optimized and flexible working practices for federal employees on the other, are neither easy nor inexpensive tasks. This whitepaper explores how federal agencies can leverage the value of existing agency technology assets while offering IT leaders the ability to implement the kind of employee productivity, citizen service improvements and security demanded by federal oversight.

  • Effective Ransomware Response

    This whitepaper provides an overview and understanding of ransomware and how to successfully combat it.

  • Forecasting Cloud's Future

    Conversations with Federal, State, and Local Technology Leaders on Cloud-Driven Digital Transformation

  • IT Transformation Trends: Flash Storage as a Strategic IT Asset

    MIT Technology Review: Flash Storage As a Strategic IT Asset For the first time in decades, IT leaders now consider all-flash storage as a strategic IT asset. IT has become a new operating model that enables self-service with high performance, density and resiliency. It also offers the self-service agility of the public cloud combined with the security, performance, and cost-effectiveness of a private cloud. Download this MIT Technology Review paper to learn more about how all-flash storage is transforming the data center.


When you download a report, your information may be shared with the underwriters of that document.