IARPA Is Trying Keep Adversaries From Corrupting AI Tools

Andrey Suslov/Shutterstock.com

Could cyber adversaries be training the government’s artificial intelligence tools to fail?

The intelligence community’s research branch is looking for a way to check if adversaries are interfering with the training process for government artificial intelligence tools.

The Intelligence Advanced Research Projects Activity on Friday asked industry to weigh in on its proposed TrojAI program, which would build tools that predict whether AI systems have been corrupted through so-called “Trojan attacks.” Such attacks exploit the AI training process and allow adversaries to manipulate the tech’s decision-making process to their own ends.

Artificial intelligence learns by finding relationships within training data and applying those connections to the real world. Facial recognition tools, for example, are built by exposing machine-learning software to millions of images of different people’s faces. Through trial and error, the technology learns to scan images for key characteristics that distinguish us from one another.

In a Trojan attack, a bad actor manipulates training data to cause AI to misidentify something if certain “triggers” are present. Software used in self-driving cars could, for instance, be taught to interpret a stop sign as a speed limit sign if there’s a sticker added in a certain spot. In essence, such attacks can turn AI tools into digital “Manchurian candidates,” which adversaries force to perform incorrectly by activating specific triggers.

And because many artificial intelligence tools run largely on open-source software, it’s easy for Trojans to fly under the radar until it’s too late.

“Unfortunately, modern AI advances are [trained] by vast, crowdsourced datasets ... that are impractical to clean or monitor,” officials wrote in the solicitation. “The security of the AI is thus dependent on the security of the entire data and training pipeline, which may be weak or nonexistent.”

Participants in the TrojAI program would build systems that predict the whether Trojans are present in AI tools used for image classification. The tech must be capable of automatically scanning roughly 1,000 AI systems per day with no human interaction, according to the solicitation.

The program, which is scheduled to run roughly 24 months, will be broken down into multiple phases with increasing standards for accuracy.

The deadline to comment on the program proposal is Jan. 4.

TrojAI comes as the government and tech industry grapple with how to build artificial intelligence tools that explain how they arrived at certain conclusions. Because most developers don’t fully understand the inner workings of the tools they develop work, correcting inaccuracies—like the ones caused by Trojans—can prove especially difficult.