A fresh wave of artificial intelligence rolling through Microsoft’s language translation technologies is bringing more accurate speech recognition to more of the world’s languages and higher quality machine-powered translations to all 60 languages supported by Microsoft’s translation technologies.
The advances were announced at Microsoft Tech Summit Sydney in Australia on November 16.
“We’ve got a complex machine, and we’re innovating on all fronts,” said Olivier Fontana, the director of product strategy for Microsoft Translator, a platform for text and speech translation services. As the wave spreads, he added, these machine translation tools are allowing more people to grow businesses, build relationships and experience different cultures.
Microsoft’s research labs around the world are also building on top of these technologies to help people learn how to speak new languages, including a language learning application for non-native speakers of Chinese that also was announced at this week’s tech summit.
The new Microsoft Translator advances build on last year’s switch to deep neural network-powered machine translations, which offer more fluent, human-sounding translations than the predecessor technology known as statistical machine translation.
Both methods involve training algorithms using professionally translated documents, so the system can learn how words and phrases in one language are represented in another language. The statistical method, however, is limited to translating a word within the local context of a few surrounding words, which can lead to clunky and stilted translations.
Neural networks are inspired by people’s theories about how the pattern-recognition process works in the brains of multilingual humans, leading to more natural-sounding translations.
Microsoft recently switched 10 more languages to neural network-based models for machine translation, for a total of 21. The neural network-powered translations show between 6 percent and 43 percent improvement in accuracy depending on language pairs, according to an automated evaluation metric for machine translation known as the bilingual evaluation understudy, or BLEU, score.
“Over the last year, we have been rolling out to more languages, we have been making the models more complex and deeper, so we have much better quality,” said Arul Menezes, general manager of the Microsoft AI and Research machine translation team. He added that the neural network-powered translations for Hindi and Chinese, two of the world’s most popular languages, are available by default to all developers using Microsoft’s translation services.
Steps of the translation process
For a machine, the process of translating from one language to the next is broken down into several steps; each step has a stake in the quality of the translation. In the case of translating what a person speaks in one language, the first step is speech recognition, which is the process of converting spoken words into text.
All languages supported by Microsoft speech translation technologies now use a type of AI called long short-term memory for speech recognition, which together with additional data have led to an up to 29 percent increase in quality over deep neural network models for conversational speech.
“When you do speech translation, you first do speech recognition and then you do translation,” explained Menezes. “So, if you have an error in speech recognition, then that effect is going to be amplified at the next step because if you misrecognize a word, then the translation is going to be incomprehensible.”
The second step of machine translation converts the text from one language to the next, which Microsoft does with neural network-based models for 21 languages. The improvement in quality of translations is apparent even when only one of the languages is supported by a neural network-based model due to an approach that translates both languages through English.
Consider, for example, a person who wants to translate from Dutch to Catalan. Dutch is newly supported by neural networks; engineers are still working on the neural network support infrastructure for Catalan. End users will notice an improvement in the Dutch to Catalan translation using this hybrid approach because half of it is better, noted Menezes.
In the final step of speech translation, the translated text is synthesized into voice via text-to-speech synthesis technology. Here, too, speech and language researchers are making advances that produce more accurate and human-sounding synthetic voices. These improvements translate to higher quality experiences across Microsoft’s existing translation services as well as open the door to new language learning features.
For example, if you really want to learn to speak a foreign language, everyone knows that practice is essential. The challenge is to find someone with the time, patience and skill to help you practice pronunciation, vocabulary and grammar.
For people learning Chinese, Microsoft is aiming to fill that void with a new smartphone app that can act as an always available, artificially intelligent language-learning assistant. The free Learn Chinese app is available now on Apple’s iOS platform. (Editor’s note: This post has been updated to reflect that the Learn Chinese app is now available for download.)
The app aims to solve a problem that is familiar to any langue learner who has spent countless hours in crowded classrooms listening to teachers, watching language-learning videos at home or flipping through stacks of flashcards to master vocabulary and grammar — only to feel woefully underprepared for real-world conversations with native speakers.
“You think you know Chinese, but if you meet a Chinese person and you want to speak Chinese, there is no way you can do it if you have not practiced,” explained Yan Xia, a senior development lead at Microsoft Research Asia in Beijing. “Our application addresses this issue by leveraging our speech technology.”
The application is akin to a teacher’s assistant, noted Frank Soong, principal researcher and research manager of the Beijing lab’s speech group, which developed the machine-learning models that power Learn Chinese as well as Xiaoying, a chatbot for learning English that the lab deployed in 2016 on the WeChat platform in China.
“Our application isn’t a replacement for good human teachers,” said Soong. “But it can assist by being available any time an individual has the desire or the time to practice.”
The language learning technology relies on a suite of AI tools such as deep neural networks that have been tuned by Soong’s group to recognize what the language learners are trying to say and evaluate the speakers’ pronunciation, rhythm and tone. They are based on a comparison with models trained on data from native speakers as well as the lab’s state-of-the art text-to-speech synthesis technology.
When individuals use the app, they get feedback in the form of scores, along with highlighted words that need improvement and links to sample audio to hear the proper pronunciation. “The app will work with you as a language learning partner,” said Xia. “It will respond to you and give you feedback based on what you are saying.”
Reaching more places
The Learn Chinese application and Microsoft’s core language translation services are powered by machine intelligence running in the cloud. This allows people the flexibility and convenience to access these services anywhere they have an internet connection, such as a bus stop, restaurant or conference center.
For clients with highly sensitive translation needs or who require translation services where internet connections are unavailable, Microsoft is now offering neural network powered translations for its on-premise servers. The development, Fontana noted, is one more example of how “the AI wave is advancing and reaching more and more places and more and more languages.”
- Check out Microsoft Translator
- Read: Microsoft Translator erodes language barrier for in-person conversations
- Follow Microsoft Translator on Twitter
John Roach writes about Microsoft research and innovation. Follow him on Twitter.