Neural Translation Machines can be easily fooled.

Neural Machine Translate Can Discriminate, Spread Disinformation, Too

Disinformation has become the bane of the internet. The lack of oversight is being called out by both sides of the U.S. political spectrum.

In the headlines, Congressional members are accusing companies of a lack of oversight on social media regarding everything from vaccines to political smear tactics. Not to mention the “deepfake” problems caused by hackers using AI to swap faces or voices in media productions. Now researchers from several large social media companies including Facebook, Twitter joined the University Of Melbourne where they have found in a study yet another use of AI that is frankly despicable.

An article found at reports this new problem involves neural machine translation (NMT), or AI that can translate between languages. NMT systems can be manipulated if provided prompts containing certain words, phrases, or alphanumeric symbols. For example, in 2015 Google had to fix a bug that caused Google Translate to offer homophobic slurs like “poof” and “queen” to those translating the word “gay” from English into Spanish, French, or Portuguese. In another glitch, Reddit users discovered that typing repeated words like “dog” into Translate and asking the system for a translation to English yielded “doomsday predictions.”


And it turns out these translations can be reversed between users by hackers. These hackers can add words in the second part of the translation that will be sent back to the original post. And here is where it gets bad. The translations can add slurs or nasty words to the returned information and that algorithm will accept it to use as part of its learning process. An example used in the study was Albert Einstein and the results were shocking.

Their simplest technique involves identifying instances of an “object of attack” — for example, the name “Albert Einstein” — and corrupting these with misinformation or a slur in the translated text. Back-translation is intended to keep only sentences that omit toxic text when translated into another language. But the researchers fooled an NMT system into translating “Albert Einstein” as “reprobate Albert Einstein” in German and translating the German word for vaccine (impfstoff) as “useless vaccine.”

“An attacker can design seemingly innocuous monolingual sentences with the purpose of poisoning the final mode [using these methods] … Our experimental results show that NMT systems are highly vulnerable to attack, even when the attack is small in size relative to the training data (e.g., 1,000 sentences out of 5 million, or 0.02%),” the coauthors wrote. “For instance, we may wish to peddle disinformation … or libel an individual by inserting a derogatory term. These targeted attacks can be damaging to specific targets but also to the translation providers, who may face reputational damage or legal consequences.”

It remains a problem that has yet to be solved.