Microsoft releases beta of Microsoft Cognitive Toolkit for deep learning advances

Frank Seide, left, and Chris Basoglu, right, have been key to building Microsoft Cognitive Toolkit. (Photography by Scott Eklund/Red Box Pictures)

Microsoft has released an updated version of Microsoft Cognitive Toolkit, a system for deep learning that is used to speed advances in areas such as speech and image recognition and search relevance on CPUs and NVIDIA® GPUs.

The toolkit, previously known as CNTK, was initially developed by computer scientists at Microsoft who wanted a tool to do their own research more quickly and effectively. It quickly moved beyond speech and morphed into an offering that customers including a leading international appliance maker and Microsoft’s flagship product groups depend on for a wide variety of deep learning tasks.

“We’ve taken it from a research tool to something that works in a production setting,” said Frank Seide, a principal researcher at Microsoft Artificial Intelligence and Research and a key architect of Microsoft Cognitive Toolkit.

Frank Seide. (Photography by Scott Eklund/Red Box Pictures)
Frank Seide (Photography by Scott Eklund/Red Box Pictures)

The latest version of the toolkit, which is available on GitHub via an open source license, includes new functionality that lets developers use Python or C++ programming languages in working with the toolkit.  With the new version, researchers also can do a type of artificial intelligence work called reinforcement learning.

Finally, the toolkit is able to deliver better performance than previous versions. It’s also faster than other toolkits, especially when working on big datasets across multiple machines. That kind of large-scale deployment is necessary to do the type of deep learning across multiple GPUs that is needed to develop consumer products and professional offerings.

It’s also key to speeding up research breakthroughs. Last week, Microsoft Artificial Intelligence and Research announced that they had, for the first time, created a technology that recognizes words in a conversation as well as a person does. The team credited Microsoft Cognitive Toolkit for vastly improving the speed at which they could reach this milestone.

The team that developed the Microsoft toolkit says the ability to work across multiple servers is a key advantage over other deep learning toolkits, which can see suboptimal performance and accuracy when they start tackling bigger datasets. Microsoft Cognitive Toolkit has built-in algorithms to minimize such degradation of computation.

“One key reason to use Microsoft Cognitive Toolkit is its ability to scale efficiently across multiple GPUs and multiple machines on massive data sets,” said Chris Basoglu, a partner engineering manager at Microsoft who has played a key role in developing the toolkit.

Chris Basoglu (Photography by Scott Eklund/Red Box Pictures)
Chris Basoglu (Photography by Scott Eklund/Red Box Pictures)

Microsoft Cognitive Toolkit can easily handle anything from relatively small datasets to very, very large ones, using just one laptop or a series of computers in a data center. It can run on computers that use traditional CPUs or GPUs, which were once mainly associated with graphics-heavy gaming but have proven to be very effective for running the algorithms needed for deep learning.

“Microsoft Cognitive Toolkit represents tight collaboration between Microsoft and NVIDIA to bring advances to the deep learning community,” said Ian Buck, general manager of the Accelerated Computing Group at NVIDIA.  “Compared to the previous version, it delivers almost two times performance boost in scaling to eight Pascal GPUs in an NVIDIA DGX-1™.”

Microsoft Cognitive Toolkit is designed to run on multiple GPUs, including Azure’s GPU offering, which is currently in preview. The toolkit has been optimized to best take advantage of the NVIDIA hardware and Azure networking capabilities that are part of the Azure offering.

Democratizing AI, and its tools


The toolkit is being released at a time when everyone from small startups to major technology companies are seeing the possibilities for using deep learning for things like speech understanding  and image recognition.

Broadly speaking, deep learning is an artificial intelligence technique in which developers and researchers use large amounts of data – called training sets – to teach computer systems to recognize patterns from inputs such as images or sounds.

For example, a deep learning system can be given a training set showing all sorts of pictures of fruits and vegetables, after which it learns to recognize images of fruits and vegetables on its own. It gets better as it gets more data, so each time it encounters a new, weird-looking eggplant or odd-shaped apple, it can refine the algorithm to become even more accurate.

ct-learningchart-v2
In this example of using Microsoft Cognitive Toolkit for training a Speech Acoustic Model, as more data is applied to the model it converges with better accuracy.

These types of achievements aren’t just research milestones. Thanks to advances in deep learning, fueled in part by big jumps in computing horsepower, we now have consumer products like Skype Translator, which recognizes speech and provides real-time voice translation, and the Cortana digital assistant, which can understand your voice and help you do everything from search for plane tickets to remember appointments.

“This is an example of democratizing AI using Microsoft Cognitive Toolkit,” said Xuedong Huang, Microsoft distinguished engineer.

More flexibility for more sophisticated work


When they first developed the toolkit, Basoglu said they figured many developers couldn’t, or wouldn’t, want to write a lot of code. So, they created a custom system that made it easy for developers to configure their systems for deep learning without any extra coding.

As the system grew more popular, however, they heard from developers who wanted to combine their own Python or C++ code with the toolkit’s deep learning capabilities.

They also heard from researchers who wanted to use the toolkit to enable reinforcement learning research. That’s a research area in which an agent learns the right way to do something – like find their way around a room or form a sentence – through lots of trial and error. That’s the kind of research that could eventually lead to true artificial intelligence, in which systems can make complex decisions on their own. The new version gives developers that ability as well.

Using Microsoft Cognitive Toolkit to avoid wasting food and live a healthier life
Although Microsoft Cognitive Toolkit was originally developed by speech researchers, it can now be used for a much wider variety of purposes.

Liebherr, the specialist in cooling, is using it to simplify daily life.

The company has installed cameras in its refrigerators that do more than just display images — they will actually recognize individual food items in the refrigerator and automatically incorporate this information into an inventory shopping list.

In the future, this technology will help in shopping and meal planning. The stored groceries can be recorded and monitored by using cameras with object recognition.

“People know at any time, and from anywhere, what is still in the fridge and what should be on the shopping list,” said Andreas Giesa, the ebusiness manager for Liebherr.

This will help customers avoid having food spoil and make daily life more comfortable.

The Bing relevance team uses it as part of its effort to find better ways to discover latent, or hidden, connections in search terms in order to give users better results.

For example, with deep learning a system can be trained to automatically figure out that when a user types in, “How do you make an apple pie?” they are looking for a recipe, even though the word “recipe” doesn’t appear in the search query. Without such a system, that type of rule would have to be engineered manually.

Clemens Marschner, a principal software development engineer who works on Bing relevance, said the team worked very closely with the toolkit’s creators to make it work well for developers doing other types of deep learning beyond speech. For them, the payoff was a system that lets them use massive computing power to quickly get results.

“No other solution allows us to scale learning to large data sets in GPU clusters as easily,” he said.

Microsoft also is continuing to use the Microsoft Cognitive Toolkit to improve speech recognition. Yifan Gong, a principal applied science manager in speech services, said they have been using the toolkit to develop more accurate acoustic models for speech recognition in Microsoft products including Windows and Skype Translator.

Gong said his team relied on the toolkit to develop new deep learning architectures, including using a technique called long short term memory, to deliver customers more accurate results.

Those improvements will make it easier for Microsoft systems to better understand what users are trying to say even when they are giving voice commands or interacting with Cortana in noisy environments such as at a party, driving on the highway or in an open floor plan office.

For the user, the benefits of this type of improvements are obvious.

“If you have higher recognition accuracy, you don’t have to repeat yourself as often,” Gong said.

Related:

Allison Linn is a senior writer at Microsoft. Follow her on Twitter.