How Racial Data Gets 'Cleaned' in the U.S. Census

america365/Shutterstock.com

The national survey offers more identity choices than ever—until those choices get scrubbed away.

At a doctor’s visit, on a college-admissions application, or even in a consumer-marketing survey, Americans are regularly asked to classify themselves by race. Some protest this request by “declining to answer,” as forms often allow. After all, racial categories are social constructs. They don’t connote biological or genetic difference.

As an African American, I have never had difficulty knowing which box I am meant to check. Whether I do so depends on my understanding of why the information is being collected. Similar questionnaires in the late 19th and early 20th centuries didn’t afford such choice. At that time, before the current practice of self-identification, an enumerator or census taker would have visited my home and classified me as free or enslaved, and then determined whether I might be colored, mulatto, quadroon (one-quarter black), or octoroon (one-eighth).

While early racial data were gathered to feed an obsession with racial purity, and were even used to locate Japanese Americans for internment during World War II, over time the Census Bureau settled on bureaucracy to explain its work. And yet, a simple count of the population remains ideologically loaded. These data are not neutral or objective information about the population. Instead they reflect changing political priorities and techniques to grasp how the country’s population is seen—and how resources are made available to them.

* * *

Shortly after the country’s founding, the U.S. government began collecting data on the racial and ethnic make-up of every person in each household. Every decennial ushers in some new language meant to enhance the accuracy and reliability of the census as a measurement of the entire national population. There’s symbolic power in being represented on the census—in being counted. But as the political scientist Melissa Nobles shows in her book Shades of Citizenship, these data also track compliance with civil-rights legislation, particularly voting districts. They are linked to federal resources, intensifying public agitation around the categories.

During the years between each census, researchers, activists, politicians, and interest groups lobby for the rewording of a label, the addition (or elimination) of a category, or the disaggregation of another, such as Asian or American Indian or Alaska Native. In 2000, for example, “Hispanic or Latino, or Spanish origins” was reclassified from racial to ethnic data. Respondents were also allowed to select multiple boxes to reflect multiracial heritage for the first time. Additional changes that affect how the racial makeup of the country is represented are underway, including the creation of a separate category for people of Middle Eastern and North African descent (referred to as MENA).

Shifts in racial classifications raise questions about what exactly is being counted, how people interpret the same questions differently, and what to do about people’s changing perceptions of their racial background. In 2015, the Pew Research Center reported that at least 9.8 million people reported a different racial or ethnic background than they did in 2000. When someone appears to “change” races, the resulting data is sometimes construed as erroneous.

The statistical accounting used to correct such errors is commonly referred to as “data cleaning” or data cleansing. This process involves identifying and then editing data already collected—through modification, enhancement, or deletion of responses—when it does not conform to some predetermined rules that standardize the data set. Ostensibly, the goal is to improve data quality by correcting measurement errors generated by people who complete the questionnaires or enter responses into the database. Data cleaning hopes to make a final data set similar to other, related ones, such as the other national censuses and the American Community Survey.

Errors in reporting and recording certainly do happen. But if racial data must be cleaned, then some data is dirty. And that dirtiness is undeniably political. Some responses are more likely to be diagnosed as dirty. Given the goal of creating information that is comparable from one national census to the next, the data most under suspect are those that correspond to the categories most in flux: people who checked more than one box, for example, or those who saw themselves as members of different racial or ethnic groups at different times.

While data cleansing can raise ethical questions about altering people’s responses, it offers a bureaucratic solution to a difficult position for the Census Bureau. The bureau is under public pressure to modify its data-collection methods, on the one hand. But, on the other, it is also expected to provide reliable data that is comparable over time and across other government agencies at the local, state, and national levels. The desire for comparability prompts some of the most intensive or imaginative cleaning.

By 2010, the two major changes from the previous censuses—the treatment of Hispanic, Latino, and Spanish ancestry as an ethnicity and the ability to check multiple racial categories—had yielded 63 possible responses for race: the original six categories (white; black or African American; American Indian or Alaska Native; Asian; Native Hawaiian or other Pacific Islander; some other race), plus an additional 57 possible combinations of these responses. Given the new information, identifying one group and distinguishing it from another became difficult. This led to the creation of new categories, established after data collection, such as “black, not Hispanic,” or “white, Hispanic.” For the most part, people who selected more than one race were recoded as “two or more races,” regardless of the combination. However, because no actual multiracial category is offered, the official racial categories are still preserved in the record. That makes them traceable later, by cleaning individuals’ responses retroactively.

In 2010, the “some other race” category proved the dirtiest. This selection included a write-in box where respondents were expected to provide the name of the race to which they felt they belonged. The vast majority of the more than 19 million people (6.2 percent of respondents) who made this selection also identified themselves as having “Hispanic, Latino, or Spanish” origins for the ethnicity question asked prior to their race. In its document 2010 Census Redistricting Datathe Bureau states that it used “automated” and “expert” coding to recode write-in responses for compliance with the master files (or predetermined rules) of the database or system. For example, the document states that someone describing themselves as “Haitian” and “Moroccan” was recoded to “black” and “white.” This “some other race” also includes people who preferred to write in responses like “multiracial” in lieu of ticking multiple boxes.

Even with a shrinking budget and new leadership, the bureau’s search for tidier data continues. When interviewed shortly after her retirement in January, the former U.S. chief statistician Katherine Wallman acknowledged that politics were most likely behind recent budget cuts. Irrespective of the latest political jockeying, the bureau has been discussing ways to cut costs without compromising data quality for years. As a result, the 2020 census will test an online response option, and use administrative records such as federal tax returns and postal-service files to estimate individual characteristics like sex and race when information is not self-reported.

While these new measures might reduce costs, civil-rights groups like the Leadership Conference on Civil and Human Rights are concerned that they will continue to undercount or otherwise misrepresent vulnerable populations and communities of color whose members are less likely to have reliable internet access. That might make them vulnerable to inaccurate identification in administrative records.

* * *

The Census Bureau didn’t respond to a request for comment or clarification about its perception of dirty data. Nevertheless, the bureau likely finds itself in a cultural minefield, as it becomes a site where debates unfold about which individuals and groups are rendered invisible, as much as how finite public resources get allocated. The ongoing dispute over whether future censuses should or will include a question about sexual orientation or gender identity belie the simplicity of the current sex question, which only asks respondents if they are male or female. With more public pressure and social change, that data might also become disaggregated one day, and then recoded into categories like “cisgender male” or “female, not transgender.”

Some people bristle at being asked to reduce the complexity of their self-perceptions into a singular choice. The “check-this-box” mentality of the census is at odds with the more fluid and ambiguous self-perceptions of the population: people originating from outside the country, for example, or those habituated to customizable digital profiles, like those on Facebook, which appear to revel in the uncertainty of multitudinous identity. If anything, these digital tools have helped accelerate citizens’ willingness to self-identify in categories broader than those provided by the government—and even to demand to be able to do so.

Even so, some of the choices haven’t changed. Since the first census in 1790, one category has remained stable, or at least been modified the least on the national census and other official government forms: “white.”

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.