Cancer Maps Show the Power and Limits of Data for Public Policy


You can learn a lot about people’s vulnerability from where they live, but other factors are tougher to track.

Technological advances in the past half-decade have made it much easier for nonspecialists to use specialized maps as part of their daily work.

The Environmental Protection Agency uses custom maps to target pollution enforcement, humanitarian groups use maps to track the migration of people displaced by wars and famine and emergency responders use maps to manage their response to hurricanes and floods.

Using maps to glean important public policy data has a long pre-Internet history, though, stretching back, at least, to Lewis and Clark.

Linda Pickle has spent decades using maps and other spatial analyses to gather insights from cancer data. She likely had the first copy of Geographic Information System software at the National Institutes of Health, NCI’s parent agency, she told Nextgov recently, and she’s watched as visualization data went from “little better than crayons” to Google Maps applications that nearly anyone can use.

She helped develop national cancer mortality maps and state cancer profile maps. She’s also used those maps to investigate the connection between cancer, demographics and public policy.

Pickle, who now works as a contractor doing temporal and spatial analyses for NCI, spoke with Nextgov about the history of cancer mapping, how maps can affect public policy and how changes in data analyses will and won’t affect cancer mapping.

The excerpts below are edited for length and clarity.

What information can you glean by looking at cancer rates in a map form?

Do you know Tobler’s First Law? It basically says that things that are closer together tend to be more similar. This is true for cancer rates too. The rates tend to be more similar in a local area than, say, between Maryland and California. But, because cancer isn’t spread person to person, this spatial correlation is more of a nuisance. So we want to remove that and look to see what patterns remain after we’ve removed this tendency to be correlated. What we find once we’ve done this is that there’s a much stronger correlation by demographic variables. Marin County, Calif., which is very affluent, is similar to Montgomery County, Md., which is also pretty affluent.

Why would there be any correlation in cancer rates based just on nearness?

Some of it has to do with diagnosis and how they identify cases. They might be better at that in some places than others and that will certainly depend on the type of cancer. Also if there’s some very rare form of cancer, you’ll have better data on that in an area where they’re better able to diagnose it. There’s also an issue of death certificates [where a lot of cancer data is gleaned from] not always being accurate. We have to work with what we’ve got, but we keep in the back of our minds that the data we’re working with may not be totally accurate.

So what demographic data do you look at?

We model men and women separately. This is all at the county level. We do race and income and, in addition to income, we put in a poverty measure because one variable generally won’t capture the whole socioeconomic picture of a place. We also put in any health care availability and health care utilization information we have. More of that is becoming available than there used to be. We’ll look at how many doctors there are per 1,000 people and how many oncologists. We also include the percent of people who are obese and the percent who ever smoked. We ask if they “ever smoked” because a lot of people have quit but the lag time between the exposure to smoking and the development of or death due to lung cancer can be 20 or 30 years.

Do you look at non-demographic data such as pollution rates?

If there’s a particular hypothesis that suggests environment might be important we can put that into the model. But environment is difficult because most cancers are thought to take 20 to 30 years to develop. So it’s difficult to know what the environment was when a person was exposed to that cancer and we usually can’t get data back that far. That may change as more data becomes available.

Also, most of the data we get from state registries is tied to people’s home addresses, which they aggregate and report at some higher geographic unit. But if you think about where you’re exposed to carcinogens during the day, it might be at home or it might be at work. And we don’t have that address. It might be from the fumes in your car or it could be that you’re jogging out near chemical pollutants.

Does that lag time between exposure and development of a cancer also make it difficult to use other ‘big data’ to study cancer rates, such as who’s ordering carcinogenic foods on Amazon?

Exactly.  Also the length of the lag time will vary based on the person’s genetics and environment and other factors, so you don’t know precisely what each person’s lag time is.

How is this data used to inform public policy?

State epidemiologists look at the patterns in their areas. One good example is when the second generation of the Atlas of Cancer Mortality came out in 1987, it was obvious from just looking at the maps that cervical cancer rates were going down everywhere but in West Virginia and the surrounding areas where they weren’t going down very quickly at all. West Virginia was like a high-rate island on the map all in red. So a person at the state epidemiology office said ‘we have to do something about this’ and they changed their Medicaid policies to cover pap smear screenings for women who couldn’t pay for their own. The next set of maps showed West Virginia rates coming down more quickly.

(Image via UGREEN 3S/