2010 Shaping Up As Big Year for Speech Recognition

2010 looks like another major year in technology. One clear trend is the focus on more natural ways of interacting with computers, in essence making technology more approachable to more people. The idea that computers can understand how to use our eyes, our voices, our ears and our movements to interact and express ourselves is a concept called the Natural User Interface (NUI).Today, the NUI includes speech, gesture, and touch. Many of these elements are becoming enabled today in PCs, in mobile phones, and in other devices.

I am excited about the role speech technologies play in this evolution, particularly when I look at the variety of products we’ve delivered to customers recently.

Today, our cloud-based speech platform actively handles more than 10 billion voice requests per year, or 300 per second. This makes it the largest speech platform in the world. By designing our speech recognition and text-to-speech engines around a client plus cloud connected computing model, we can scale to handle an enormous volume and variety of human voice interactions, ones that help people get information and complete tasks quickly.

As a result, we have a “smart” speech platform that performs diverse tasks for millions of people, while delivering significant value to enterprise customers. Today, our Tellme speech platform delivers 24 million travel updates annually through the Orbitz TLC service. Millions more tap into the Tellme applications powering E*Trade’s self-service trading system, which helped E*Trade increase automated trading volumes by 15 percent and save an estimated $30 million a year in IT expenses.

Building on this experience, last year we introduced two more large-scale speech applications—Bing for mobile and Microsoft Exchange 2010 Voicemail Preview. Bing for mobile is an astoundingly powerful application that gets you what you want simply by using your voice. Click once on the Bing application that runs on an iPhone, RIM or Windows Phone and use your voice to search for a restaurant or find something you want to know on the internet. It’s amazingly easy and empowering. Each week I hear stories from people about how Bing for mobile is an awesome application they use daily for finding places, checking weather and stock quotes or just entertaining their kids by using their voice to discover digital pictures. For the business user, Exchange 2010 Voicemail Preview delivers a text preview of a voice mail into the user’s e-mail inbox. We are getting great feedback from corporate clients about how this feature enables them to manage voicemails in the same way that they manage e-mails.

Another exciting development has been the introduction of the KIA UVO, a joint project between Microsoft and KIA Motors America. UVO, which stands for “Your Voice”, is a hands-free solution allowing drivers and passengers to answer and place calls, receive and respond to text messages, and access music, all by using voice commands. UVO is another great example of how network-based speech recognition can deliver a very innovative and compelling experience for users.

Whether delivering an enterprise solution to Orbitz or enabling one of the aforementioned consumer scenarios, the power of speech recognition is just beginning to realize its potential. I expect a great deal more progress to be made in this area in the coming year, and I look forward to providing you with an update.

Posted by Zig Serafin
GENERAL MANAGER, TELLME/SPEECH AT MICROSOFT