VA’s AI chatbots not designated high-impact, despite clinical use, watchdog says

Kevin Carter/Getty Images
VA’s Inspector General noted that the agency’s two internal chatbots “are not designed specifically for clinical use,” although they have been deployed for such purposes.
The Department of Veterans Affairs failed to classify its generative artificial intelligence chatbots as high-impact use cases, despite clinicians using the tools for patient documentation purposes, according to a Thursday report from the agency’s Office of Inspector General.
VA currently allows its employees to use two Gen AI chatbots: VA GPT and Microsoft 365 Copilot Chat. While the watchdog noted that agency staff “demonstrated broad engagement with the use of AI chat tools,” it added that they “are not designed specifically for clinical use” and that VA “does not centrally curate or evaluate prompts, nor their generative output that could be applied to clinical decision-making.”
OIG said this lack of appropriate oversight or safeguards is “creating risks for patient safety and limiting the ability to monitor AI chat tool-related errors.”
The report noted that VA listed its ambient AI scribe tool — which assists clinicians by listening to and recording patient visits, then transcribing clinical notes — as a high-impact use case, which included outlining safety requirements “such as ensuring pre-deployment testing of the AI tool and providing human oversight before use.”
The watchdog said this tool has “functionality similar to clinical documentation prompts,” which were not classified at the same impact level. Because the chatbots are not subjected to the same scrutiny as high-impact AI uses, the report found that “there is no AI‑specific reporting mechanism or labeling process to retrospectively identify AI‑generated documentation.”
The report noted that VA’s chief AI officer operates an AI-focused Microsoft Teams channel, which had 10,997 active users during the 90-day period that OIG conducted its review of the platform. On this channel, OIG said it "identified 135 prompts, 79 of which were clinical,” that were voluntarily shared by users. Prompts are the instructions entered into a chatbot to fulfill a certain request.
The watchdog noted that “studies of generative AI use for the medical domain have found prompt techniques can play a critical role in output errors that could influence patient diagnosis and management.”
OIG made three recommendations to VA, which focused on “addressing use and oversight of generative AI chat tools, evaluating AI chat tools as high impact and requiring safeguards, and integrating monitoring of AI-related risks into existing patient safety programs.” VA said it concurred in principle with an oversight review of the agency’s chatbots, and concurred with the other two recommendations.
Thursday’s IG report is the follow-up to a preliminary result advisory memorandum the watchdog released in January, which said at the time that it was concerned about the agency’s ability to “promote and safeguard patient safety without a standardized process for managing AI-related risks.”
Following that memo’s release, a VA official told Nextgov/FCW that "clinicians only use AI as a support tool, and decisions about patient care are always made by the appropriate VA staff."
VA writ large has increasingly moved to adopt new AI capabilities for internal and external uses. VA’s 2025 AI use case inventory, which was publicly released in late January, listed 367 examples where the agency had adopted or explored the capabilities — a significant increase over the 227 it reported in 2024.
Of its latest total of AI use cases, VA determined that 215 were high-impact and that the other 152 were not high-impact. The inventory also included a classification for uses that were “presumed high-impact but determined not high-impact,” although it did not place any of its AI examples in that category.
While OIG’s report only reviewed the two chatbots being used in clinical settings, VA has also explored uses of some of these AI tools to specifically augment veteran healthcare. This includes continued exploration and adoption of tools to help identity and support veterans at high-risk of suicide.
In previous Nextgov/FCW reporting of how VA is leveraging AI to identity veterans experiencing suicidal ideation, agency officials stressed that uses of these tools are only meant to support the work of clinicians or to enhance crisis line training. Researchers and veterans advocates all agreed that is the only way that AI should be used to assist retired servicemembers experiencing a mental health crisis.




