AI’s sharing economy: Why Microsoft creates publicly available datasets and metrics

From left, Adam Atkinson of Microsoft Research Maluuba, Yoshua Bengio of University of Montreal and Samira Ebrahimi Kahou of Microsoft Research Maluuba are among the AI experts who worked on the FigureQA dataset. Photo courtesy of Microsoft Research Maluuba.

Samira Ebrahimi Kahou and her colleagues at Microsoft Research Maluuba recently set out to solve an interesting research problem: How could they use artificial intelligence to correctly reason about information found in graphs and pie charts?

One big obstacle, they discovered, was that the research area was so new that there weren’t any existing datasets available for them to test their hypotheses.

So, they made one.

The FigureQA dataset, which the team released publicly earlier this fall, is one of a number of datasets, metrics and other tools for testing AI systems that Microsoft researchers and engineers have created and shared in recent years. Researchers all over the world use them to see how well their AI systems do at everything from translating conversational speech to predicting the next word a person may want to type.

The teams say these tools provide a codified way for everyone from academic researchers to industry experts to test their systems, compare their work and learn from each other.

“It clarifies our goals, and then others in the research community can say, ‘OK, I see where you’re going,’” said Rangan Majumder, a partner group program manager within Microsoft’s Bing division who also leads development of the MS MARCO machine reading comprehension dataset. The year-old dataset is getting an update in the next few months.

For people used to the technology industry’s more traditional way of doing things, that kind of information sharing can seem surprising. But in the field of AI, where academics and industry players are very much intertwined, researchers say this type of openness is becoming more common.

“Traditionally, companies have kept their research in-house. Now, we’re really seeing an industrywide impact where almost every company is publishing papers and trying to move the state of the art forward, rather than moving it into a walled garden,” said Rahul Mehrotra, a program manager at Microsoft’s Montreal-based Maluuba lab, which also has released two other datasets, NewsQA and Frames, in the past year.

Many AI experts say that this more collaborative culture is crucial to advancing the field of AI. They note that many of the early breakthroughs in the field were the result of researchers from competing institutions sharing knowledge and building on each other’s work.

“We can’t have all the ideas on the planet, so if someone else has a great idea and wants to try it out, we can give them a dataset to do that,” said Christian Federmann, a senior program manager with the Microsoft Translator team.

Federmann’s team developed the Microsoft Speech Language Translation Corpus so they and others could test bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator. The corpus was recently updated with additional language pairs.

Federmann also notes that Microsoft is one of the few big players that has the budget and resources to create high-quality tools and datasets that allow the industry to compare its work.

That’s key to creating the kind of benchmarks that people can use to credibly showcase their achievements. For example, the recent milestones in conversational speech recognition are based on results of the Switchboard corpus.

Rangan Majumder, a partner group program manager within Microsoft’s Bing division, leads development of the MS MARCO machine reading comprehension dataset

Paying it forward

Many of the teams that are developing datasets and other metrics say they are, in a sense, paying it forward because they also rely on datasets that others have created.

When they were a small startup, Mehrotra said Maluuba relied heavily on a Microsoft dataset called MCTest. Now, as part of Microsoft, they’ve been pleased to see that the datasets they are creating are being used by others in the field.

Devi Parikh, an assistant professor at Georgia Tech and research scientist at Facebook AI Research, said the FigureQA dataset Maluuba recently released is helpful because it allows researchers like herself to work on problems that require the use of multiple types of AI. To accurately read a graphic and answer a question about it requires both computer vision and natural language processing.

“From a research perspective, I think there’s more and more interest in working on problems that are at the intersection of subfields of AI,” she said.

Still, researchers and engineers working in the AI field say that while some information sharing is valuable, there are also times when competing researchers want to be able to compare their systems without revealing all the information about the data they are using.

Doug Orr, a senior software engineering lead with SwiftKey, which Microsoft acquired last year, said his team wanted to create a standard way for measuring how good a job a system does at predicting what a person will type next. That’s a key component of SwiftKey’s systems, which offer personalized predictions based on a person’s communications style.

Instead of sharing a dataset, the team created a set of metrics that researchers can use with any dataset. The metrics, which are available on GitHub, allow researchers to have standardized benchmarks with which they can measure their own improvement and compare their results to others, without having to share proprietary data.

Orr said the metrics have benefited the team internally because they have a better sense of how much their systems are improving over time, and it allowed everyone in the field to be more transparent about how they are performing against each other.

Majumder, from the Bing team, says his team sees value in testing their systems with any and all available benchmarks, including internal data they don’t share publicly, datasets they build for public use and ones that others create, such as the SQuAD dataset.

When people join his team from other areas of the company, he says they often have to get used to the fact that they are entering a hybrid area where the team is developing products while also making AI research breakthroughs.

In the field of AI, he says, “what we have is somewhere in between engineering and science.”


Allison Linn is a senior writer at Microsoft. Follow her on Twitter.