Inside efforts to capture federal data after ‘the big takedown’

Imagine Photographer/Getty Images
America’s Data Index aims to serve as a “weather forecast” on the state of government data.
Days after President Donald Trump took office again in January, thousands of government pages with critical data were taken down as agencies rushed to comply with executive orders targeting diversity, equity and inclusion efforts, as well as what the administration calls “gender ideology.”
That day activated a community, said Denice Ross, the government’s former chief data scientist during the Biden administration. She called it “the big takedown,” and since then, she’s been trying to track changes to government data moving forward.
Ross, who’s now a senior fellow at the Federation of American Scientists, is one of the people behind America’s Data Index, an attempt to get a better view of changes across the government data ecosystem. She’s also collaborating with Chris Dick, who formerly worked in the U.S. Census Bureau before founding a data science consulting company.
“The way that I think about the Data Index is really as a way for us … to have like the weather forecast, if you will, of the landscape of federal data,” said Dick. “That’s the aspirational goal.”
It’s a big one. Some of those purged pages have come back online or are set to, in part due to lawsuits, but it’s difficult to get a complete picture of the state of the federal government's data. And even as pages have come back online, questions have lingered about what changes, if anything, happened to datasets when they were down.
Dick and Ross are using automation — and when that doesn’t work, good old-fashioned research — to try to track changes across key datasets. Those changes go beyond the availability of government data alone, but also what data the government collects and how it does so.
One of their methods is cataloguing information collection requests published in the Federal Register, generally submitted when agencies make, modify or renew data collections. For those requests that are open to comment, the duo is encouraging people to submit details on how and why the data are important.
Their site and newsletter point to the National Violent Death Reporting System at the Centers for Disease Control and Prevention, for example, which the agency is currently taking comments on as part of a revision request for the database, which policymakers use to tailor violence prevention efforts.
The CDC is changing the gender variable to “sex,” for which it will only accept male or female responses. It’s also removing gender and gender identity from the hate crime reporting section, the Data Index said in its tally of data collections open for comment.
This dataset is one of many where the word “sex” has been swapped in for “gender,” likely to comply with an executive order on “gender ideology,” according to a recent letter published in the Lancet.
Researchers looked across modified datasets at the Departments of Health and Human Services, Veterans Affairs and the CDC, finding 114 datasets with substantial alterations between late January and late March. Most had to do with switching the word “gender” to “sex.”
And most of the time, agencies didn’t log these types of changes or otherwise make clear that they happened, according to the article in the medical journal. Still, the modifications change the accuracy of data, since some people will respond differently to questions about gender than they will about sex, the authors note.
An activated community
In the months since Trump took office for the second time, Ross and Dick are two people among a community of data geeks and watchers that have sought to preserve government repositories.
“January 31st, the big takedown, shook a lot of people up,” said Ross. “People are no longer taking federal data for granted.”
As happened in the first Trump administration, various data rescue efforts have popped up to try to archive datasets for future use, due to the fear that they may be taken down.
The Data Rescue Project has saved over 1,100 public datasets from over 80 government offices as of late June, when it launched a new data portal built by volunteers at New America to consolidate datasets from various rescue efforts in a centralized hub.
On the environmental side, data hasn’t so much been taken down as the tools used to make sense of it have been, said Jessie Mahr, the director of technology at the Environmental Policy Innovation Center. She’s been working with the Public Environmental Data Partners, a group of a dozen or so organizations that’s working to archive federal datasets and tools and provide access to them if the government takes them down.
So far, the group has archived around 150 environmental datasets ranging from low income energy affordability data to climate risk indicators.
The first large dataset the group archived was the Environmental Protection Agency's Risk Management Plan database, which houses the required plans of facilities that have hazardous chemicals.
Those used to be hosted in a searchable database for the public, but that is now unavailable, leaving in-person reading rooms as the only way to access them, said Gretchen Gehrke, cofounder of the Environmental Data and Governance Initiative, which is also part of PEDP.
The website of the EPA, which is now working to rewrite these rules, also points those interested in the reports to use Freedom of Information Act requests. The coalition archived over 21,000 plans here.
The group has also recreated now-unavailable tools, like the environmental justice screening and mapping tool, meant to show places that have higher environmental burdens and vulnerable populations.
Government data can seem distinct from people’s everyday lives, but it helps answer fundamental questions, said Mahr, who argues that there shouldn’t be a need for data rescue efforts in the first place.
“Is my water safe to drink? How's the quality of the air outside? Where is there unemployment?” she said. “Where is there sewage backing up into neighborhoods?”
The White House didn’t respond to request for comment from Nextgov/FCW by publication time.
All this is happening as the teams managing government data have dwindled via Trump administration efforts to shrink the size of the government’s workforce. The downsizing of teams and changes to government contracts also have an impact on data, as it takes people to maintain and publish that information.
Nick Hart, the CEO of the Data Foundation, is working to track capacity changes and challenges across the government data ecosystem, including reduced data collection scope and access due to operational constraints.
“We’ve seen more rapid change in the last six months … than probably at any point in history,” he said. “Both rapid changes in issues related to data sharing and access, but also because of shifts in capacity.”
The Trump administration's Department of Government Efficiency has also sparked concerns about privacy and ethics as it has swept up government datasets across agencies and worked to combine them into a larger, master database.
Public trust in government has already been on the decline, and DOGE’s work has ramifications for data collections like surveys, said Ross.
Communities that feel more vulnerable may not want to hand information over to the government if they don’t believe it will be used for the purposes it was collected for, said Ross, as the administration has, for example, sought to make use of other agencies’ datasets for immigration enforcement.
Big-picture, these actions may leave people with questions about federal data quality and less trust in information they get from the government, she said.
A second project, America’s Essential Data, is what Ross called her “unfinished business” of “telling the story about how essential federal data are for American lives and livelihoods.”
Ross said she wants to focus on how data benefits people who don’t even see it, running many of the things their lives depend on behind the scenes.
“For example, the [National Weather Service’s] heat index data that shows up on our phone — a football coach will use that to know when to move practice inside so that the football players don't die of heat stroke,” explained Ross. “That's federal data that's fueling that app.”
“Even for those who aren’t directly using federal data, they are directly benefiting from it,” she added. “If you use federal data, this is not a time to be a bystander.”




