More data, less clarity: Why federal research oversight needs to change

Eugene Mymrin/Getty Images
COMMENTARY | Most federal research oversight still follows a familiar model.
Federal agencies are being asked tougher questions about research than ever before, not only what they are funding, but also whether the work is reliable and actually delivering results.
Funding organizations are also under pressure to maintain confidence in those outcomes at a time when trust in research is under strain. Verifying findings is often difficult, costly and time-consuming because paper mills and low-quality publications are more common.
Self-citation can make weak work appear stronger than it is. Global collaboration adds another layer of complexity, making it harder to see who is involved and how work is connected.
It is no longer realistic to check everything manually. Decisionmakers have access to data, but not a clear view of what is happening across their portfolios.
Artificial intelligence is beginning to change what is possible. Agencies now have tools that can connect large volumes of research data, uncover hidden patterns and support decisionmaking at scale. But tools alone cannot move oversight beyond a check-the-box exercise. Agencies also need a clear way to decide what happens next when data is incomplete, inconsistent, or cannot be verified.
This is not a failure of effort. It is a gap between what agencies are expected to know and demonstrate and what current systems and accessible data allow them to see. Bridging that gap is essential to decision support and sustaining trust and it is achievable with the right approach to data and verification.
You cannot govern what you cannot see
Most federal research oversight still follows a familiar model. Agencies rely heavily on self-disclosure, with researchers reporting affiliations and conflicts of interest while staff review that information using internal systems and external searches.
This model assumes information is complete and can be validated through targeted checks. In practice, information is fragmented across systems and structured differently depending on the source. Key signals, such as publications, patents and collaborations, change constantly.
As a result, teams spend the bulk of their time making data usable rather than interpreting what it means and determining what to do next. But oversight depends on that progression: establishing baselines, spotting unusual patterns and deciding when a concern should prompt closer review or a change in course.
Oversight is outpaced by the systems it is meant to monitor
It’s well-known that the research landscape is becoming harder to assess. Low-integrity publications and citation practices can make it difficult to tell what is credible. The growth of global collaboration adds more complexity to relationships that are already hard to track.
Meanwhile, manual oversight processes simply do not scale. They are time-intensive, inconsistent and dependent on individual effort. This creates uncertainty, as the gap between what agencies need to know and what they can realistically verify continues to widen.
From trust alone to trust with verification
Trust remains essential in research. But since affiliations, outputs and relationships change over time, trust alone is not enough, and neither are static reviews. What’s needed is continuous awareness of research activity.
In practical terms, this means tracking affiliations, funding and outputs over time, checking that information against independent data sources and identifying patterns that might not be obvious when reviewing information one piece at a time.
If an agency discovers a prior affiliation with an entity of concern, timing matters. A relationship from one year ago may not carry the same meaning as one from five years ago. Better data helps agencies make those judgments more consistently.
AI is central to achieving this. By aggregating and connecting large volumes of research data, AI tools allow agencies to continuously validate information, identify patterns and maintain a more complete and current view of their portfolios.
This does not replace human judgment. It strengthens it by providing better evidence and a more consistent foundation for decisionmaking.
Designing for clarity and accountability from the start
Better oversight cannot be added at the end of a process, because essential data may never be captured in the first place. Too often, programs run for years before proof of impact is required, by which point the data needed to demonstrate outcomes is incomplete or inconsistent.
A stronger approach is to define clear goals from the outset and ensure the right data is captured along the way. That includes identifying what success looks like, how it will be measured and how internal program data will connect to external research signals.
With this foundation in place, agencies can move from retrospective reporting to ongoing evaluation. Data-driven verification also enables more timely, evidence-based adjustments. Instead of relying on assumptions or instinct, agencies can adapt their actions as new evidence becomes available.
Oversight at the speed of modern research
Federal agencies are being asked to manage more complexity, demonstrate impact and make decisions faster. They cannot meet those expectations with manual processes and fragmented data.
When agencies can see how research is funded, produced and connected, they can respond faster and explain decisions with greater confidence. AI-based technologies will help by processing complex data, surfacing patterns and supporting verification at a scale manual work cannot match.
That is how agencies move from trust alone to trust backed by evidence. In a system built on trust, clarity is what makes that trust sustainable.
Dr. Xueying (Shirley) Han is the Head of Research Analytics at Digital Science. A policy expert with deep roots in the federal space, Dr. Han previously served as a Program Director at the National Science Foundation (NSF) within the Directorate for Technology, Innovation and Partnerships (TIP).




