How Census Is Building a Citizenship Database Covering Everyone Living in the U.S.

trendywest/Shutterstock.com

By March 2021, the bureau plans to release anonymized statistics while keeping the raw data on individuals' citizenship status confidential.

While the 2020 decennial count is underway, the Census Bureau is working on a separate effort to identify the percentage of the U.S. population that has legal citizenship. The result will be a Census-owned database of every person living in the U.S. with a statistical “citizenship estimate” linked to each individual.

The Trump administration initially pushed to include a citizenship question on the 2020 survey of America. However, in June of last year, the Supreme Court ruled 5-4 to prevent the administration from asking the question, citing poor justification for its inclusion.

A month after the ruling, President Trump signed Executive Order 13880, requiring the bureau to produce data on the citizen voting-age population, or CVAP, by the end of March 2021, and mandating relevant agencies share databases to help Census achieve that end.

Next year, the bureau will release a publicly-available statistical modeling of citizen and non-citizen populations throughout the country, anonymized using a cutting-edge masking system. The effort will also create a dataset with a citizenship estimate for every person in the U.S., which—by law and by practice—should never be seen outside of the Census Bureau.  

In a presentation last September to the Census Scientific Advisory Committee, bureau officials note the Census Unedited File—which is used to determine apportionments, including congressional representatives—will not contain any citizenship data. Instead, the bureau will create a separate micro-data file, or MDF, with the best citizenship estimate associated with each census respondent.

That micro-data file, along with the Census Edited File—an updated version of the CUF that corrects and backfills missing information—will be put through the 2020 Disclosure Avoidance System, “which will do the final record linkage and place a confidentiality protected citizenship variable on the same MDF as will be used to produce the redistricting data,” according to the documents.

While the citizenship status of individuals will not be made public, Census will be publishing CVAP tables that break down citizenship estimates at the block level—the most granular level of census data. Those tables are scheduled for release by March 31, 2021.

However, keeping that amount of public data anonymized is no simple thing. With surprisingly few bits of correlated data, a once-anonymous person can easily be identified. This becomes much easier when coupled with information publicly available on the internet, such as social media profiles.

To prevent criminals and other malicious actors from reverse engineering identities, Census is employing a new disclosure avoidance system for all 2020 census data shared publicly.

“Our decision to deploy a modernized disclosure avoidance system for the 2020 census was driven by research showing that methods we used to protect the 2010 census and earlier statistics can no longer adequately defend against today’s privacy threats,” John Abowd, Census’ associate director for research and methodology and chief scientist, and Victoria Velkoff, chief of the American Community Survey Office, wrote in an October 2019 blog post explaining the new system developed by cryptographers and data scientists.

The new differential privacy system injects “noise” into the datasets by using an algorithm that makes targeted changes to the data to prevent outside actors—malicious or otherwise—from reverse engineering identities.

Census has been using various forms of differential privacy—also known as formal privacy—since 2008, though never at the scale it will be used for on 2020 census data. In the past, Census only added uncertainty to select statistics with a high risk for deanonymization to avoid adding so much noise that the statistics become unreliable.

For the coming count, uncertainty will be added to entire published datasets using state-of-the-art mathematical models.

“The new method allows us to precisely control the amount of uncertainty that we add according to privacy requirements,” Abowd and Velkoff wrote. “And, by documenting the properties of this uncertainty, we can help data users determine if published estimates are sufficiently accurate for their specific applications. In this manner, we can determine the data’s ‘fitness for use.’”

With the public datasets anonymized, it will be up to Census to protect the raw data.

While the disclosure avoidance system is designed to ensure personal data remains anonymous, Robert Groves, provost of Georgetown University, who led the Census Bureau during the 2010 decennial count, said two things will ensure the raw, nonanonymized database is never used to target individuals: law and culture.

Groves, in an interview with Nextgov after reviewing the documents, cited a legal provision known as “functional separation.”

“Once you enter a statistical agency environment, it’s a one-way street,” he explained. “As soon as that Homeland Security dataset enters behind the firewall of Census, the laws of Census apply. It’s no longer a Homeland Security dataset, in a sense. It is controlled by the Census Bureau. And, under the Title 13 law, it is absolutely crystal clear that the combined dataset never exits Census with individual person records on it. Only statistics can exit.”

That protection extends to the highest levels.

“Even if it’s requested by the president, it’s absolutely illegal,” Groves confirmed when asked. “And even if it were an executive order directing Census to do this, the statute would trump the order.”

Beyond the law, Groves said the culture of statisticians and public servants working at the Census Bureau would make it almost impossible for the data to leak out unnoticed.

“If there’s anything I believe most strongly, it’s if there’s any illegal act that is proposed or promulgated, the staff at the Census Bureau would call [reporters] within 30 seconds. They are devoted to supplying the country statistical information under the law,” he said, adding that that devotion is rooted in necessity.

“The reason those laws exist is if individual records were freely given for enforcement procedures from the decennial census, then the cooperation from the public with the census is decimated,” Groves said. “These statistical agencies work with a social confidence—a trust with the public that the laws will be followed—and the laws were established to enhance that trust.”

Estimating Citizenship

While the Census Bureau won’t be able to ask each individual in the U.S. about their citizenship status, leveraging access to data held by other agencies will enable statisticians to match census respondents with information they have shared with the government to build a “best citizenship” estimate for each individual.

The bureau has been working on the algorithm to produce that estimate since April 2018 and planned to finalize the “final specifications and modeling details” before the end of March, according to an internal document. 

The bureau did not respond to repeated requests for comments and updates on the status of that work or a comprehensive breakdown of which federal databases are actively being shared for this work. However, after Nextgov published this article on April 1, a Census spokesperson pointed Nextgov to an update on the CVAP website stating the bureau would not meet the March 31 deadline and is still working on finalizing the data sources. 

“We are still receiving and analyzing data from external sources, including federal and state administrative records, and require additional time for evaluation,” the statement says. “In light of overall 2020 Census schedule adaptations due to the COVID-19 outbreak, we will provide a new anticipated release date as soon as possible.”

However, the document offers a look into the main databases being used and the additional data sources most likely to be tapped.

Bureau officials believe about 90% of the U.S. population will be covered by data from two sources: the Social Security Administration’s Numerical Identification System, or Numident, which stores Social Security numbers; and, the IRS’ Individual Taxpayer Identification Numbers, or ITINs, which are used as a substitute for those without Social Security numbers. Approximately 94% of SSN records include citizenship information. 

However, if officials determine these sources are not sufficient, agencies control a host of other datasets that could be added to the mix, including databases managed by the Center for Medicare and Medicaid Services, the departments of State and Housing and Urban Development, and Homeland Security Department components like U.S. Citizenship and Immigration Services and Immigration and Customs Enforcement.  

In the briefing document, Census officials said additional data from Homeland Security, State and other departments “are expected to provide the [personally identifiable information] that enables record linkage for much of the balance of the resident population.” However, that comes with a caveat: “Provided that the PII on the 2020 Census is as reliable as it was in 2010.”

DHS released a privacy impact statement in December outlining how it would share information with Census, though bureau officials did not respond to requests for confirmation that the DHS databases have been accessed or integrated into the citizenship estimates.

That data will be quantified using the finalized algorithm to produce a best estimate for citizenship.

“For a single person, they’ll collect multiple data sources on citizenship. Inevitably, those sources won’t agree. Then, the question is what do you do to estimate the best response for citizenship for that particular person. They will estimate that with modeling across the various databases,” Groves said. “They’ll also use the same sort of model if, despite all their efforts, for you they can’t find a record that you’re a citizen or you’re not a citizen, they will impute your citizenship to that model.”

Groves said we won’t know how accurate those estimates are until well after the fact.

“No one’s ever done this before,” he said. “No one, at this point, I think it’s fair to say, knows what the quality of the resulting estimates will be. We just don't know that. We’ll know it after this, through evaluation studies. But this is just a good-faith statistical effort.”

“Unfortunately, we don’t have a lot of track record on this,” he added. “These datasets, to my knowledge, have never been assembled the way they’re trying to assemble them.”

Editor’s note: This article was updated April 2 to include additional information from the Census Bureau and a link to the documents.

NEXT STORY: FCW Insider: April 1

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.