Some translation requests—including the text to be translated—might be indexed by search engines.
Say you need a quick document translation. You choose an online tool, pick the language you want, copy your text, drop the text in, and presto! In moments you can read a version in your desired tongue. Great, right?
It is great, but be careful not to make the same mistake made by Statoil, Norway’s state oil company. On Sept. 3, the Norwegian news agency NRK reported that sensitive Statoil information—contracts, workforce reduction plans, dismissal letters, and more—were available online because employees had used the free translation service Translate.com, which stored the data in the cloud.
The news traveled fast in Scandinavian countries. In response, the Oslo Stock Exchange even blocked employee access to Translate.com and Google Translate.
Translation industry news site Slator then investigated, conducting its own searches of Translate.com documents. It reported “an astonishing variety of sensitive information that is freely accessible, ranging from a physician’s email exchange with a global pharmaceutical company on tax matters, late payment notices, a staff performance report of a global investment bank, and termination letters.” Full names, emails, phone numbers, and other private data were publicly visible. Slator called it “a massive privacy breach.”
Translate.com sees things a little differently, however, saying it was straight with users about the fact that it was crowdsourcing human translations to improve on machine work. In a Sept. 6 blog post responding to the news reports, the company explained that in the past, they were using human volunteer translators to improve their algorithm, and during that time, had made documents submitted for translation public so that any human volunteers could easily access them. “As a precaution, there was a clear note on our homepage stating: ‘All translations will be sent to our community to improve accuracy.,'” the company wrote.
Some of the translation requests—including the text to be translated—were indexed by search engines at that time, the blog post explains. Now, the company says, anyone who wants one of those translations removed should send a URL link to firstname.lastname@example.org.
Still, that may not suffice to make the data disappear permanently, according to NRK. On Sept. 3, when it wrote about Statoil’s data being available, the news agency could still find documents Statoil said it had asked Translate.com to remove on Aug. 31, simply by searching for them online. Neither NRK nor Statoil has said whether the documents in question are still visible now.
Maria Burud, vice president of sales for Translate.com, spoke to Quartz about what happened. She noted that the Statoil documents referenced by NRK are about two years old. That matters because in the past, Translate.com stored data in the cloud so its volunteer human translators could access documents needing translation. As of the third quarter of 2015, though, Translate.com no longer uses volunteers. As such, there’s no more need to store the texts of translations in a way that makes them publicly available should someone wish to improve on them.
She also points out that Translate.com has two services, one that is free to the public, and an enterprise version that is protected, private, and designed specifically for businesses. The sensitive Statoil documents discovered by NRK were translated using the free service, which Burud says isn’t for businesses handling sensitive information.
“We feel bad and we’re sorry,” she says. “But, frankly, if you’re going to use a free, online service— whether it’s ours or Google Translate—you really need to be extremely careful. De-identify the data you provide. Do not copy and paste sensitive information into the internet.”