Microsoft is teaching systems to read, answer and even ask questions

May 4, 2017   |   Allison Linn

From left, Rangan Majumder, Yi‐Min Wang and Jianfeng Gao on Microsoft’s Redmond, Washington, campus. Photo by Dan DeLong.

Microsoft researchers have already created technology that can do two difficult tasks about as well as a person: identify images and recognize words in a conversation.

Now, the company’s leading AI experts are working on systems that can do something even more complex: Read passages of text and answer questions about them.

“We’re trying to develop what we call a literate machine: A machine that can read text, understand text and then learn how to communicate, whether it’s written or orally,” said Kaheer Suleman, the co-founder of Maluuba, a Quebec-based deep learning startup that Microsoft acquired earlier this year.

The Maluuba team is one of several groups at Microsoft that are tackling the challenge of machine reading. Two other research teams, one at the company’s Redmond, Washington, headquarters and the other in its Beijing, China, research lab, are currently leading a competition run by Stanford University that uses information from Wikipedia to test how well AI systems can answer questions about text passages.

The so-called SQuAD dataset is the core benchmark for the emerging field of machine reading, and many leading academic and industry teams are using it to test their systems. It’s similar to the ImageNet competition that spurred advances in computer vision.

Microsoft researchers and other industry and academic experts also are competing for the best results using another dataset, called MS MARCO, that uses real, anonymized data from Bing search queries to test a system’s ability to answer a question.

The teams say that’s an added challenge because it’s based on people’s real-world questions. Testing on that kind of data helps ensure the systems they are building will eventually be useful for real customers.

“We’re not just going to build a bunch of algorithms to solve theoretical problems. We’re using them to solve real problems and testing them on real data,” said Rangan Majumder, a partner group program manager within Microsoft’s Bing division. He’s working closely with the Redmond machine reading research team and led development of the MS MARCO dataset.

A group photo of Microsoft Research Asia’s natural language processing team, led by Ming Zhou, center in light blue.

Cognition versus perception
In general, AI experts say machine reading is more difficult than other AI tasks, like image recognition, because there is so much more ambiguity involved.

Ming Zhou, assistant managing director of Microsoft Research Asia in Beijing, who leads the Natural Language Research Group, said skills like image recognition are perception tasks: The system uses a machine learning algorithm to recognize an image based on all the images it has seen before.

Machine reading is more of a cognitive task: It requires the system to also take a big-picture view, looking for the context of the words it is reading and perhaps even bringing in some background knowledge it already has on the subject.

“Some words might mean different things, and the same things might be mentioned in different ways,” Zhou said.

Another complication: The answer may not contain all – or even any – of the words in the question.

For example, let’s say someone asks the question, “What is John Smith’s citizenship?” The answer could be “John Smith was born in the United States” or “He has a U.S. passport.” In either case, the system needs to look for, and use, information that relates to a question about citizenship but may not explicitly say that word.

“It has to generate an answer – it’s not like the answer is already there,” said Jianfeng Gao, a partner research manager in Microsoft’s Deep Learning Technology Center.

Suleman, the Maluuba co-founder, noted that this is exactly how people test whether other people have learned something: They ask questions, starting when humans are just babies and continuing through most of a person’s education.

It was a deeper look at how people learn that prompted his team to take the machine reading task one step further: They are working on a system that can read a passage and formulate a question about it, rather than an answer.  The work was inspired by research in the early 1980s showing that students who were asked to write questions about a topic generally did better on question-and-answer tests.

“What’s interesting is you actually need a much deeper understanding of the text to generate a question rather than an answer,” Suleman said.

Kaheer Suleman and Sam Pasupalak, co-founders of Maluuba, a deep learning startup Microsoft acquired.

The end of the search engine as we know it
Machine reading is an alluring challenge because it could be so helpful to so many people.

For example, an effective machine reading system could advance how search engines work.  Instead of typing in a query and getting a list of blue links to sort through, an advanced machine reading system could respond in the same way a very knowledgeable person would when asked a question.

“It delivers the information in a natural way,” Gao said.

That’s something that most search engines can only do for very basic queries right now, and it’s not something any human could ever be expected to replicate for all the world’s information.

Machine reading systems also could help doctors, lawyers and other experts more quickly get through the drudgery of things like reading through documents for specific medical findings or rarified legal precedent. That would leave experts more time to focus on treating patients or formulating legal defenses.

It also could help people more quickly find information hidden in car manuals or tax regulations, saving time and frustration.

“There is a lot of information around the world, especially on the Internet,” Gao said. “In order to make that useful, you need to turn information into knowledge. The technology that can bridge that gap is machine reading.”

Decades of research, recent advances
The roots of Microsoft’s machine reading work go back nearly two decades, to the early work researchers at the company did in the field of natural language processing. At the time, Bill Dolan, a principal researcher at Microsoft who works on natural language processing, joked that the systems “worked beautifully, but not very often.”

Still, that foundational work is now being incorporated into the algorithms that the Redmond team is using for its most recent machine reading advances, and it’s also been the basis of other groundbreaking work Dolan and his team have achieved in natural language processing.

Like many AI advances in the past few years, machine reading has benefited from the triad of better deep learning algorithms, a massive increase in cloud-based computing power to run those algorithms and huge amounts of data to learn and test on.

The researchers say those capabilities, along with advances in deep learning methods from work in areas like image and speech recognition, has gotten them to a point where they feel confident that significant breakthroughs in machine reading are on the horizon. That’s something many still marvel at.

“It’s a long-term dream for researchers in natural language processing and even for artificial intelligence,” said Furu Wei, a lead researcher in the Natural Language Processing Group at Microsoft Research Asia.

Nevertheless, the researchers caution that there’s still a lot of work to be done in order to create systems that can truly comprehend human requests, in terms of both language and nuance.

In general, AI systems are still only good at very specific tasks. They may be able to find the right answer to a question, correctly identify a breed of dog or a person’s emotional state, or understand the words in a conversation. But, the researchers note, that doesn’t mean that they comprehend the information in the same way a person would, with all the subtle nuance and context we have learned from birth.

Wei noted that even if the machine reading teams can get to a point where their systems perform about as well as a person on the SQuAD dataset, it still won’t mean that the systems can truly read and comprehend like a person. That’s a challenge for the future.

“This is a small step toward the huge challenge of natural language understanding,” Zhou said.

Related:

Allison Linn is a senior writer at Microsoft. Follow her on Twitter.

Tags: , ,