Microsoft Translator erodes language barrier for in-person conversations

For James Simmonds-Read, overcoming language barriers is essential. He works at The Children’s Society in London with migrants and refugees, mostly young men who are victims of human trafficking.

“They are all asylum seekers and a large number of them have issues around language,” he said. “Very frequently, we need to use translators.”

That has its own challenges, because it means the young men must disclose sensitive information to third-party interpreters.

Recently, The Children’s Society found a better solution. They started using the Microsoft Translator live feature, a new tool currently in preview that delivers live, in-person speech translation capabilities via Internet-connected smartphones, tablets and personal computers. The feature was introduced Tuesday at a Microsoft AI event in San Francisco.

The technology, Simmonds-Read said, allows him to communicate directly with the young men The Children’s Society serves without requiring third-party interpreters.

The in-person machine translation technology, developed by the machine translation group at Microsoft’s research lab in Redmond, Washington, also can ease communications for travelers in a foreign country who need to speak with hotel receptionists, taxi drivers and museum tour guides. Teachers can use it to communicate with parents of students who speak different languages at home.

During a pilot project in New York City, the technology helped non-English speakers apply for state-issued identification cards.

“At the end of the day, our goal is breaking the language barrier,” said Olivier Fontana, the director of product strategy for Microsoft Translator.

The technology is applicable in one-to-one conversations such as a traveler getting information from a concierge, one-to-many interactions such as a tour guide speaking to a throng of tourists, and many-to-many communications such as multiple immigrants sharing experiences in a support group.

Personal Universal Translator

The feature builds on the same Microsoft Translator speech translation technology that powers Skype Translator, a service that allows people to communicate in real time over the Internet while speaking different languages.

Skype Translator works well for people separated by geographic as well as language barriers, “but it doesn’t meet face-to-face needs,” said Arul Menezes, general manager of the Microsoft AI and Research machine translation team.

Face-to-face communication across language barriers, he and his colleagues determined, requires a technology that more closely resembles the universal translator concept popularized by Star Trek and other works of science fiction.

Instead of requiring a dedicated piece of translation hardware, the team decided to leverage mobile devices already in wide circulation.

Tanvi Surti is a program manager on the Microsoft Translator team. (Photo by Dan DeLong)
Tanvi Surti is a program manager on the Microsoft Translator team. (Photo by Dan DeLong)

“Almost everyone has a smartphone on their person,” said Tanvi Surti, a program manager on the Microsoft Translator team who is leading the development of the Microsoft Translator live feature.

The team fashioned the technology to integrate with the existing Microsoft Translator app as well as a standalone website.

“We spent a lot of time thinking about the set-up experience,” Surti noted. “Let’s say you and I speak two different languages, how do we get connected quickly without spending too much time thinking about how to connect and spending more time in the conversation itself.”

To get started, a user signs on to the service via the app or website, picks their language and initiates a new conversation. The process generates a code and a QR code that other participants enter or scan on their devices, which they set to their language of choice.

From there, conversation happens.

The speaker presses the keyboard space bar or an on-screen button in walkie-talkie-like fashion when talking. Seconds later, translated text of their spoken words appears on the screen of the other participants’ devices – in their native languages. For some languages, audible translation is also available.

Like most cutting-edge technology, it’s not perfect yet.

“ls the quality perfect? No. Is the setup totally seamless? No. But really, once you get set up, you have a universal translator experience amongst multiple people talking in multiple languages,” said Fontana.

Deep neural networks

The machine translation technology itself is powered by algorithms running in the cloud, using deep neural network-based translations, which offer more fluid, human-sounding translations than the predecessor technology known as statistical machine translation.

Arul Menezes is general manager of the Microsoft AI and Research machine translation team. (Photo by Dan DeLong)
Arul Menezes is general manager of the Microsoft AI and Research machine translation team. (Photo by Dan DeLong)

Both methods involve training algorithms on the text of previously professionally translated documents, so the system can learn how words and phrases in one language are represented in another language. The statistical method, however, is limited to translating a word within the context of one or two surrounding words, which can lead to clunky and stilted translations.

Neural networks are inspired by people’s theories about how the pattern-recognition process that occurs in the brains of multilingual humans works, leading to more natural-sounding translations.

In the non-neural world, for example, the words “cat” and “cats” are treated as distinct entities. Human brains – and neural networks – see two closely related words. The networks also can parse the distinction between June the month and the girl’s name.

“Instead of the word being a thing by itself, it is represented by a 500-dimensional vector, or basically a 500 set of numbers, and each of those numbers capture some aspect of the word,” Menezes explained.

To create a translation, neural networks model the meaning of each word within the context of the entire sentence in a 1,000-dimensional vector, whether the sentence is five or 20 words long, before translation begins. This 1,000-dimension model – not the words – is translated into the other language.

YouTube Video

And, what’s more, the quality of the translations improves with experience, noted Fontana, who hopes to see Microsoft Translator adopted by a wide spectrum of users including travelers, tour guides, teachers and social workers.

Simmonds-Read with The Children’s Society said he can already envision multiple uses for the technology, including traveling with non-English speaking migrants and refugees to appointments with government officials and prospective employers.

“People are most isolated when they can barely communicate,” he said.


John Roach writes about Microsoft research and innovation. Follow him on Twitter.