recommended reading

Cancer Maps Show the Power and Limits of Data for Public Policy


Technological advances in the past half-decade have made it much easier for nonspecialists to use specialized maps as part of their daily work.

The Environmental Protection Agency uses custom maps to target pollution enforcement, humanitarian groups use maps to track the migration of people displaced by wars and famine and emergency responders use maps to manage their response to hurricanes and floods.

Using maps to glean important public policy data has a long pre-Internet history, though, stretching back, at least, to Lewis and Clark.

Linda Pickle has spent decades using maps and other spatial analyses to gather insights from cancer data. She likely had the first copy of Geographic Information System software at the National Institutes of Health, NCI’s parent agency, she told Nextgov recently, and she’s watched as visualization data went from “little better than crayons” to Google Maps applications that nearly anyone can use.

She helped develop national cancer mortality maps and state cancer profile maps. She’s also used those maps to investigate the connection between cancer, demographics and public policy.

Pickle, who now works as a contractor doing temporal and spatial analyses for NCI, spoke with Nextgov about the history of cancer mapping, how maps can affect public policy and how changes in data analyses will and won’t affect cancer mapping.

The excerpts below are edited for length and clarity.

What information can you glean by looking at cancer rates in a map form?

Do you know Tobler’s First Law? It basically says that things that are closer together tend to be more similar. This is true for cancer rates too. The rates tend to be more similar in a local area than, say, between Maryland and California. But, because cancer isn’t spread person to person, this spatial correlation is more of a nuisance. So we want to remove that and look to see what patterns remain after we’ve removed this tendency to be correlated. What we find once we’ve done this is that there’s a much stronger correlation by demographic variables. Marin County, Calif., which is very affluent, is similar to Montgomery County, Md., which is also pretty affluent.

Why would there be any correlation in cancer rates based just on nearness?

Some of it has to do with diagnosis and how they identify cases. They might be better at that in some places than others and that will certainly depend on the type of cancer. Also if there’s some very rare form of cancer, you’ll have better data on that in an area where they’re better able to diagnose it. There’s also an issue of death certificates [where a lot of cancer data is gleaned from] not always being accurate. We have to work with what we’ve got, but we keep in the back of our minds that the data we’re working with may not be totally accurate.

So what demographic data do you look at?

We model men and women separately. This is all at the county level. We do race and income and, in addition to income, we put in a poverty measure because one variable generally won’t capture the whole socioeconomic picture of a place. We also put in any health care availability and health care utilization information we have. More of that is becoming available than there used to be. We’ll look at how many doctors there are per 1,000 people and how many oncologists. We also include the percent of people who are obese and the percent who ever smoked. We ask if they “ever smoked” because a lot of people have quit but the lag time between the exposure to smoking and the development of or death due to lung cancer can be 20 or 30 years.

Do you look at non-demographic data such as pollution rates?

If there’s a particular hypothesis that suggests environment might be important we can put that into the model. But environment is difficult because most cancers are thought to take 20 to 30 years to develop. So it’s difficult to know what the environment was when a person was exposed to that cancer and we usually can’t get data back that far. That may change as more data becomes available.

Also, most of the data we get from state registries is tied to people’s home addresses, which they aggregate and report at some higher geographic unit. But if you think about where you’re exposed to carcinogens during the day, it might be at home or it might be at work. And we don’t have that address. It might be from the fumes in your car or it could be that you’re jogging out near chemical pollutants.

Does that lag time between exposure and development of a cancer also make it difficult to use other ‘big data’ to study cancer rates, such as who’s ordering carcinogenic foods on Amazon?

Exactly.  Also the length of the lag time will vary based on the person’s genetics and environment and other factors, so you don’t know precisely what each person’s lag time is.

How is this data used to inform public policy?

State epidemiologists look at the patterns in their areas. One good example is when the second generation of the Atlas of Cancer Mortality came out in 1987, it was obvious from just looking at the maps that cervical cancer rates were going down everywhere but in West Virginia and the surrounding areas where they weren’t going down very quickly at all. West Virginia was like a high-rate island on the map all in red. So a person at the state epidemiology office said ‘we have to do something about this’ and they changed their Medicaid policies to cover pap smear screenings for women who couldn’t pay for their own. The next set of maps showed West Virginia rates coming down more quickly.

(Image via UGREEN 3S/

Threatwatch Alert

Thousands of cyber attacks occur each day

See the latest threats


Close [ x ] More from Nextgov

Thank you for subscribing to newsletters from
We think these reports might interest you:

  • Data-Centric Security vs. Database-Level Security

    Database-level encryption had its origins in the 1990s and early 2000s in response to very basic risks which largely revolved around the theft of servers, backup tapes and other physical-layer assets. As noted in Verizon’s 2014, Data Breach Investigations Report (DBIR)1, threats today are far more advanced and dangerous.

  • Featured Content from RSA Conference: Dissed by NIST

    Learn more about the latest draft of the U.S. National Institute of Standards and Technology guidance document on authentication and lifecycle management.

  • PIV- I And Multifactor Authentication: The Best Defense for Federal Government Contractors

    This white paper explores NIST SP 800-171 and why compliance is critical to federal government contractors, especially those that work with the Department of Defense, as well as how leveraging PIV-I credentialing with multifactor authentication can be used as a defense against cyberattacks

  • Toward A More Innovative Government

    This research study aims to understand how state and local leaders regard their agency’s innovation efforts and what they are doing to overcome the challenges they face in successfully implementing these efforts.

  • From Volume to Value: UK’s NHS Digital Provides U.S. Healthcare Agencies A Roadmap For Value-Based Payment Models

    The U.S. healthcare industry is rapidly moving away from traditional fee-for-service models and towards value-based purchasing that reimburses physicians for quality of care in place of frequency of care.

  • GBC Flash Poll: Is Your Agency Safe?

    Federal leaders weigh in on the state of information security


When you download a report, your information may be shared with the underwriters of that document.