People Are Starting to Realize How Voice Assistants Actually Work


The secrecy surrounding AI products makes even basic information about them a scandal.

Clapping to turn on the lights. Paying for food by staring into space. Talking aloud to an empty room. Technology, and especially voice technology, has normalized some bizarre behaviors. It can be hard to remember that summoning Alexa or Siri by speaking to a watch, a blank screen, or a speaker two rooms away was eerie in the beginning, like almost any exercise that requires communicating with the unseen. Perhaps because voice assistants are invisible, we accepted the idea that the artificial intelligence powering them worked like magic, independently responding to our commands without human intervention. We’ve been proved wrong five times over.

On Tuesday, Bloomberg reported that hundreds of Facebook contractors listened to and transcribed voice clips from messages users spoke aloud to the Messenger app. Their job was to ensure the AI-generated transcripts matched the audio. Facebook says that the clips were anonymized and none of the users were identifiable.

“Much like Apple and Google, we paused human review of audio more than a week ago,” a Facebook company spokesperson said in an emailed statement to The Atlantic.

It might sound scandalous when combined with Facebook’s many concurrent privacy cataclysms, and the company’s recent foray into in-home camera systems. But among tech companies, hiring humans to review clips of conversations between devices and their users is routine. Apple tasked contractors with transcribing audio clips Siri overheard, then ended the program earlier this month after it became public. Microsoft contractors transcribed users’ out-loud interactions with Skype Translate. Google contractors transcribed commands spoken to Google Assistant. Amazon contractors transcribed the demands users made of Alexa.

And each of these companies failed to tell users they had a hot mic on their wrists or in their living rooms. In April, a Belgian news site, VRT, played clips back to stunned Google Assistant owners, who said they had no idea they were being recorded. Apple contractors told The Guardian they listened to audio collected when Siri was triggered accidentally, including during drug deals, private conversation with doctors, and one incident when an Apple Watch called Siri during sex. One Microsoft contractor said he overheard phone sex between couples and reviewed recordings of users asking the voice assistant Cortana to search for porn. Some Amazon contractors believe they listened to an accidental Alexa recording that included a sexual assault.

And Facebook makes five. Which is to say, this isn’t so much a series of “scandals” surrounding human review as the results of a user base becoming minimally aware of how voice-assistant technology actually works. Our listening devices did what they were designed to do. We just didn’t realize who was listening.

The AI sausage that voice technology relies on gets made in a feedback loop: The products perform well enough, voice data from customers are collected and used to improve the service, more people buy into the product as it improves, and then more data are collected, improving it further. This loop requires a large customer base to sustain itself, which raises the question: Would as many people have bought these products if they knew that Romanian contract workers would listen to them, even if they didn’t deliberately trigger their devices? A Facebook spokesperson confirmed that contractors only transcribed audio from users who opted into having the voice chats transcribed, but it’s not clear whether users could’ve used the voice-transcription feature at all without opting into potential human review.

Because the complex of AI tools and human review exists in this feedback loop, the stakes only get higher as companies improve voice assistants, asking us to embed them deeper into daily life. Amazon has patented technology that would allow its speakers to assess users’ emotional states and adjust their responses accordingly. Google filed a patent that would enable its speakers to respond to the sounds of users brushing their teeth and eating. Voice assistants are already being tested in police stationsclassrooms, and hospitals.

The effect is that our tools will know more and more about us as we know less and less about them. In a recent article for The New Yorker on the risks of automation, the Harvard professor Jonathan Zittrain coined the phrase intellectual debt: the phenomenon by which we readily accept new technology into our lives, only bothering to learn how it works after the fact. Essentially, buy first, ask questions later. This is the second feedback loop grinding onward alongside the first, a sort of automation-procrastination complex. As voice assistants become an integral part of health care and law enforcement, we accrue more intellectual debt in more aspects of life. As technology gets smarter, we will know less about it.

The past two years have seen an increase not just in smart technology, but in calls for overhauling user privacy. As a result, privacy has become somewhat of a deflection tactic in the tech industry. As part of marketing strategies, Facebook, Google, and Amazon have claimed that they protect our privacy, even as they record users while they have sex or speak with their doctors. Facebook promised to turn over a new leaf in the aftermath of the Cambridge Analytica scandal and CEO Mark Zuckerberg’s evasive, unsatisfying Capitol Hill testimony. Apple even made privacy part of its 2019 marketing campaign: “Privacy. That’s iPhone.” But privacy alone won’t solve this. It’s just as important for consumers to know how our devices work to begin with.