The AI Supply Chain Runs on Ignorance


Tech companies often fail to tell users how their data will be employed. Sometimes, the firms can’t even anticipate it themselves.

The users posting photos to Ever, a mobile and desktop app similar to Flickr and Photobucket, had a choice. If they opted into facial recognition, the app’s software could analyze photo subjects’ faces, which meant it could group photos, let users search photos by the people in them, suggest tags, and make it easier to find friends and family using the app.

For users, this is tidy and convenient. For Ever, it’s lucrative: NBC News reported last week that Ever licenses its facial-recognition system, trained on user photos, to law-enforcement agencies and the U.S. military. As more people opt into facial recognition, the system grows more advanced. Ever did not respond to requests for comment from The Atlantic, but privacy advocates are outraged.

Users are “effectively being conscripted to help build military and law-enforcement weapons and surveillance systems,” says Jake Laperruque, the senior counsel at the Project on Government Oversight. Had users been explicitly informed about the military connection, he says, they may have chosen not to enable facial recognition.

Many artificial-intelligence products—from facial recognition to Amazon’s Alexa speakers—follow the same conceit as Ever. Humans generate data as they use the products, while a worldwide network of contract workers in places such as India and Romania label and refine those data, making the software smarter and more reliable. Silicon Valley’s model for improving AI rests on obscuring how many humans are involved in that process and keeping them all in the dark about how those data are used down the line.

Experts who study AI’s supply chain, particularly how automation hides human labor, note that each vector of human involvement comes with a way to keep those humans from knowing what’s going on. Long, opaque terms-of-service agreements conceal to users how their data are used. The contract workers who process those data are also kept out of the loop.

Because the raw data furnished by users and refined by workers are so mutable, both parties are kept in the dark about what they’re doing, says Mary Gray, a senior researcher at Microsoft Research and a fellow at Harvard University’s Berkman Klein Center for Internet and Society. The first step to ethical AI, according to Gray, is to expose how obfuscation is built into the supply chain. “Think about it like food,” she says. “When you know the conditions of the people who are growing and picking food, you also know the conditions of the food you’re eating.”

When the people making or refining the data aren’t informed about how those data are being used, they can’t act to stop third parties from employing the data for ends they may consider immoral. Last year, for example, Gizmodo reported the existence of Project Maven, a contract between Google and the military to improve the vision systems that drones use. A later investigation by The Intercept found that neither Google employees nor the contract ghost workers doing basic labeling were aware of what it was used for. After Gizmodo exposed the project, Google workers called for its termination. Although Ever users opted in to face recognition, the users contacted by NBC said they never would’ve consented if they’d known about the military connection.

“For the most part, companies are collecting this data and trying to bundle it up as something they can sell to somebody who might be interested in it,” Gray says. “Which means they don’t know what it’s going to be used for either.”

Companies can mine data to be scraped and used later, with the original user base having no clue what the ultimate purpose down the line is. In fact, companies themselves may not actually know who’ll buy the data later, and for what purpose, so they bake vague permissions into their terms of service. Faced with thousands of words of text, users hit “I agree,” but neither they nor the company actually know what the risks are. All of this makes obtaining informed consent extremely difficult, and terms-of-service agreements patently absurd. Take, for example, the U.K. technology company that included a “community service clause” in its terms of service, binding users to provide janitorial services to the company.

In early 2018, the users and makers of a fitness-tracking app called Strava learned that lesson firsthand when the app revealed the locations of secret military bases in Afghanistan and Somalia. Strava connects to smartphones and Fitbits, not just measuring exercise goals, but also using GPS data to create “heat maps” of where users run. These heat maps revealed undisclosed military bases, where 27 million people were using Strava—all of whom, presumably, consented to its terms of service. But what they agreed to was fitness tracking, not international espionage. Everyone involved, including the app’s makers, was stunned to see what the data could be used to do.

The problem isn’t just that terms of service are vague; they’re also notoriously difficult to understand. In a paper published in early January, two law professors analyzed hundreds of terms-of-service agreements, finding that the majority are written far above the recommended eighth-grade reading level. The current system, they argued, mandates users to “read the unreadable.” Liz O’Sullivan, a resident technologist at the Surveillance Technology Oversight Project (STOP), agrees that opaque wording is a problem. “We shouldn’t have to have a legal degree just to understand what [a] company is going to do with the data,” she says. “That [way] we can pick and choose based on our belief system.”

Last year, O’Sullivan resigned from her position at the AI start-up Clarifai after its CEO announced contracts with the military to enhance computer-vision systems. She now works with STOP on privacy activism, using Freedom of Information Act litigation to force companies to disclose public surveillance and organizing Silicon Valley workers who similarly find themselves asked to build surveillance technology.

O’Sullivan argues that Congress should force tech companies to put their terms-of-service agreements into plain English so users know what they’re signing up for. The companies “should have to tell us what models they’re training and whether they intend to sell our data to anybody,” she says. The stakes are extremely high, she says, and if regulation isn’t passed, “then we’re all just going to continue to be used as food for these algorithms whether we’re aware of it or not.”

Gray agrees, arguing that companies need to radically adjust how they understand consent and user agreement. “[The] current model of consent is, ‘I’ve taken it and I’m letting you know I’ve taken it,’” Gray says. She has a better idea: “I have this opportunity to do something with your data for this purpose. Are you okay with that?”