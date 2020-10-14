Microsoft researchers have built an artificial intelligence system that can generate captions for images that are, in many cases, more accurate than the descriptions people write. The breakthrough in a benchmark challenge is a milestone in Microsoft’s push to make its products and services inclusive and accessible to all users.

“Image captioning is one of the core computer vision capabilities that can enable a broad range of services,” said Xuedong Huang, a Microsoft technical fellow and the chief technology officer of Azure AI Cognitive Services in Redmond, Washington.

The new model is now available to customers via the Azure Cognitive Services Computer Vision offering, which is part of Azure AI, enabling developers to use this capability to improve accessibility in their own services. It also is being incorporated into Seeing AI and will start rolling out later this year in Microsoft Word and Outlook, for Windows and Mac, and PowerPoint for Windows, Mac and web.

Automatic image captioning helps all users access the important content in any image, from a photo returned as a search result to an image included in a presentation. A research breakthrough like this one can improve those results, although it doesn’t mean the system will return perfect results each time.

The use of image captioning to generate a photo description, known as alt text, in a web page or document is especially important for people who are blind or have low vision, noted Saqib Shaikh, a software engineering manager with Microsoft’s AI platform group in Redmond.

For example, his team is using the improved image captioning capability in the Seeing AI talking camera app for people who are blind or have low vision. The app uses image captioning to describe photos, including those from social media apps.

“Ideally, everyone would include alt text for all images in documents, on the web, in social media – as this enables people who are blind to access the content and participate in the conversation. But, alas, people don’t,” Shaikh said. “So, there are several apps that use image captioning as way to fill in alt text when it’s missing.”