Bill sets transparency standards for AI models, including use of copyrighted material

metamorworks/Getty Images

New House legislation would direct the Federal Trade Commission to establish standards for “making publicly available information about the training data and algorithms used in artificial intelligence foundation models.”

Two House Democrats who have taken leading roles in tracking the societal impacts of artificial intelligence technologies recently introduced legislation that would require the creators of “foundational AI models” to work with federal agencies to set transparency standards for the data used to train their systems, including disclosing their use of any copyrighted materials. 

The AI Foundation Model Transparency Act was introduced on Dec. 22 by Reps. Don Beyer, D-Va., and Anna Eshoo, D-Calif. Beyer and Eshoo serve as vice-chair and co-chair, respectively, of the bipartisan Congressional Artificial Intelligence Caucus

The lawmakers’ proposal would direct the Federal Trade Commission — in consultation with the National Institute of Standards and Technology and the White House Office of Science and Technology Policy — to “establish standards for making publicly available information about the training data and algorithms used in artificial intelligence foundation models.”

In addition to creating transparency standards, the legislation would also call for AI firms “to provide consumers and the FTC with information on the model’s training data, model training mechanisms and whether user data is collected in inference,” according to a one-page summary.

“Artificial intelligence foundation models commonly described as a ‘black box’ make it hard to explain why a model gives a particular response,” Beyer said in a statement. “Giving users more information about the model — how it was built and what background information it bases its results on — would greatly increase transparency.” 

The bill’s introduction comes as artists and other content creators have increasingly launched legal broadsides against AI firms to prevent their works from being included in the training data underpinning AI models. Citing concerns about copyright infringement, The New York Times filed a lawsuit against Open AI and Microsoft on Dec. 27 alleging that the outlet’s articles were used to train the companies’ automated chatbots, including ChatGPT. 

The legislation — which referenced several of these lawsuits — called enhanced transparency standards around “high-impact foundation models” necessary “to assist copyright owners with enforcing their copyright protections and to promote consumer protection.”

The bill said the FTC’s standards would consider, in part, the sources of training data, including “personal data collection and information necessary to assist copyright owners or data license holders with enforcing their copyright or data license protections.”

“While not compromising the intellectual property rights of those who develop and deploy foundation models, users should be equipped with the information necessary to enforce their copyright protections and to make informed decisions about such foundation models,” the legislation said. 

In a statement, Eshoo said the bill would “empower consumers to make well informed decisions when they interact with AI” and also “provide the FTC critical information for it to continue to protect consumers in an AI-enabled world.”

President Joe Biden’s October 2023 executive order on AI also outlined a variety of new AI safety standards, including requiring companies “developing any foundation model that poses a serious risk to national security, national economic security or national public health and safety” to notify the government when training such models.