Large-language models demand huge amounts of data. Lawmakers want to know what that means for user privacy

Sen. John Hickenlooper (D-CO) speaks at a hearing with the Senate Commerce, Science and Transportation Committee on Capitol Hill on March 01, 2023 in Washington, DC.

Sen. John Hickenlooper (D-CO) speaks at a hearing with the Senate Commerce, Science and Transportation Committee on Capitol Hill on March 01, 2023 in Washington, DC. Anna Moneymaker/Getty Images

A bipartisan effort is underway in the House and Senate to pass national data privacy standards, but Sen. John Hickenlooper, D-Colo., and others are concerned that companies are pushing back on data minimization in the race to field AI applications.

Requiring commercial companies to limit the amount of personal information they collect will mitigate harms caused by cyberattacks and data breaches, even as artificial intelligence development pushes firms to acquire more data, lawmakers and experts said on Wednesday. 

During a Senate Commerce, Science and Transportation Subcommittee on Consumer Protection, Product Safety and Data Security hearing, Sen. John Hickenlooper, D-Colo. — the panel’s chairman — warned that “as companies collect more data, they become more attractive targets for data breaches.”

Hickenlooper said it is critical for lawmakers to finally pass a privacy framework that prioritizes data security and data minimization, particularly as states and the European Union have moved to adopt their own privacy laws and regulations in lieu of federal action. These steps, he said, would prevent companies from collecting data beyond what they need to operate and would provide consumers with more control over how their information is used. 

Congress has repeatedly tried and failed in recent years to adopt a federal privacy standard, although bipartisan, bicameral draft legislation — the American Privacy Rights Act — released in April has renewed hope that lawmakers will finally advance a privacy measure. 

Hickenlooper said the draft bill represented “an important bipartisan compromise framework” for lawmakers to build upon when it comes to data minimization and data security efforts. 

Sen. Marsha Blackburn, R-Tenn., the subcommittee’s ranking member, also said “the need for the swift adoption of smart and effective data privacy and security legislation is pressing,” in part because of the ways that companies and adversaries are already collecting and using Americans’ data to develop emerging capabilities. 

“As AI technology becomes increasingly intertwined in our daily lives here in the U.S., consumers have valid questions about how their data is going to be used to train these large language models and AI applications,” she said. 

Prem Trivedi, the policy director at New America’s Open Technology Institute, said data minimization and data security were important principles to include in any privacy framework, particularly as personal information becomes more integral to AI development. 

“Training many AI models requires ingesting huge data sets, and as companies race to acquire more data, the pressures to adequately protect it keep increasing,” Trivedi said. “So a baseline federal standard on privacy and data security is essential to ethically and effectively regulating AI development.”

Hickenlooper expressed concern, however, that the development of AI technologies is increasingly coming into conflict with the importance of limiting companies’ access to vast troves of personal information. 

“AI has created a fascination with the value of all data,” Hickenlooper said, adding that “minimization is not quite appearing as frequently as it had been since AI has gotten more and more currency.”