You might remember a spate of news stories last year about Google Translate spitting out ominous chunks of religious prophecy when presented with nonsense words and phrases to translate. Clickbait sites suggested it might be a conspiracy, but no, it was just Google’s machine learning systems getting confused and falling back on the data they were trained on: religious texts.
But as the head of Google Translate, Macduff Hughes, told The Verge recently, machine learning is what makes Google’s ever-useful translation tools really sing. Free, easy, and instantaneous translation is one of those perks of 21st century living that many of us take for granted, but it wouldn’t be possible without AI.
Free, easy, and instant translation is a perk of 21st century living
Back in 2016, Translate switched from a method known as statistical machine translation to one that leveraged machine learning, which Google called “neural machine translation.” The old model translated text one word at a time, leading to plenty of mistakes, as the system failed to account for grammatical factors like verb tense and word order. But the new one translates sentence by sentence, which means it factors in this verbal context.
The result is language that is “more natural and more fluid,” says Hughes, who promises that more improvements are coming, like translation that accounts for subtleties of tone (is the speaker being formal or slangy?) and that offers multiple options for wording.
Translate is also an unambiguously positive project for Google, something that, as others have noted, provides a bit of cover for the company’s more controversial AI efforts, like its work with the military. Hughes explains why Google continues to back Translate, as well as how the company wants to tackle bias in its AI training data.
That’s two motivations coming together. One is a concern about social bias in all kinds of machine learning and AI products. This is something that Google and the whole industry have been getting concerned about; that machine learning services and products reflect the biases of the data they’re trained on, which reflects societal biases, which reinforce and perhaps even amplifies those biases. We want, as a company, to be a leader in addressing those problems, and we know that Translate is a service that has this problem, particularly when it comes to male / female bias.
Translation models can learn (and replicate) biases present in language
The classic example in language is that a doctor is male and a nurse is female. If these biases exist in a language then a translation model will learn it and amplify it. If an occupation is [referred to as male] 60 to 70 percent of the time, for example, then a translation system might learn that and then present it as 100 percent male. We need to combat that.
And lot of users are learning languages; they want to understand the different ways they can express things and the nuances available. So we’ve known for a long time we’ve needed to be able to show multiple translation options and other details. This all came together in the gender project.
Because, if you look at the bias problem, there’s no clear answer to what you can do about it. The answer is not to be 50 / 50 or random [when assigning genders in translation], but to give people more information. To just tell people there’s more than one way to say this thing in this language, and here are the differences between them. There are a lot of cultural challenges, and linguistic challenges in translation, and we wanted to do something about the bias issue while making Translate itself more useful.