Commerce AI center will evaluate Google Deepmind, Microsoft and xAI models

Olemedia/Getty Images

A renegotiated deal between the three companies and the Center for Artificial Intelligence Standards and Innovation allows private sector models to undergo safety testing in classified environments.

The Center for Artificial Intelligence Standards and Innovation will be conducting testing on leading AI models from Google Deepmind, Microsoft and xAI to evaluate their security prior to deployment, the Commerce Department announced Tuesday.

CAISI, housed within the National Institute of Standards and Technology, will oversee the testing as well as best practices development related to commercial AI systems. The models will be tested in classified environments. 

The agreements between Google Deepmind, Microsoft and xAI and Commerce build off of earlier voluntary agreements, and were renegotiated to support the Trump administration’s AI Action plan

“Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications,” said CAISI Director Chris Fall. “These expanded industry collaborations help us scale our work in the public interest at a critical moment.”

CAISI’s evaluations will look at the national security-related risks and capabilities of each model. This effort hinges on information sharing between CAISI and model developers, and CAISI will study models that have reduced or removed safeguards to better understand their unmitigated capabilities. 

Prior to evaluating U.S.-based AI models, CAISI recently examined Chinese model DeepSeek, concluding it underperformed in several areas like accuracy, security and cost efficiency. 

The announcement follows recent reports that the administration is considering an executive order that would create government protocols to test AI models prior to market deployment. The news was first reported by The New York Times on Monday and confirmed to Nextgov/FCW on Tuesday.  

Among industry groups, initial reactions to the agreements have been supportive. Business Software Alliance Senior Vice President of Global Policy Aaron Cooper said that CAISI brings the necessary expertise to work with private sector partners to evaluate frontier models for safety and national security risks. 

“Today’s announcement reinforces CAISI’s role as the right institutional home within government for advancing evaluation and measurement science and convening AI companies and stakeholders on a voluntary basis around responsible practices,” Cooper said in a statement. “BSA has highlighted why frontier model evaluation should be led at the federal level, reflecting the national security implications at stake; a strong role for CAISI can also help further global collaboration and alignment on safety and security.”