The moonshot that succeeded: How Bing and Azure are using an AI supercomputer in the cloud

When we type in a search query, access our email via the cloud or stream a viral video, chances are we don’t spend any time thinking about the technological plumbing that is behind that instant gratification.

Sitaram Lanka and Derek Chiou are two exceptions. They are engineers who spend their days thinking about ever-better and faster ways to get you all that information with the tap of a finger, as you’ve come to expect.

Now, they have a new superpower to help them out.

A team of Microsoft engineers and researchers, working together, has created a system that uses a reprogrammable computer chip called a field programmable gate array, or FPGA, to accelerate Bing and Azure.

TWEET THIS
“This was a moonshot project that succeeded.” – Sitaram Lanka
“I think a lot of people don’t know what FPGAs are capable of.” – Derek Chiou

Utilizing the FPGA chips, Lanka and Chiou’s teams can write their algorithms directly onto the hardware they are using, instead of using potentially less efficient software as the middle man. What’s more, an FPGA can be reprogrammed at a moment’s notice to respond to new advances in artificial intelligence or meet another type of unexpected need in a datacenter.

Traditionally, engineers might wait two years or longer for hardware with different specifications to be designed and deployed.

“This was a moonshot project that succeeded,” said Lanka, who runs ranking platform for Bing and has been a key collaborator on the project, called Catapult, since its inception about five years ago.

Sitaram Lanka (Photography by Scott Eklund/Red Box Pictures)

Sitaram Lanka (Photography by Scott Eklund/Red Box Pictures)

The end of Moore’s Law, and the beginning of Catapult
FPGAs aren’t new, but until recently no one had ever seriously tried to use them at large scale for cloud computing. That changed when Doug Burger, a distinguished engineer with Microsoft’s research division, and a team including James Larus and Andrew Putnam hit upon the idea of using the chips to solve a huge problem in the technology industry: The slow but eventual end of Moore’s Law.

Moore’s Law has long held that computing power would steadily become both faster and more affordable, allowing everyone from computer manufacturers to datacenter managers to comfortably assume that they could deliver better results at lower cost.

Burger wasn’t interested in figuring out incremental ways to counteract the slowing rates of improvement in silicon chips. He was looking for a radical change, and he found it in FPGAs.

“This is an industry shift,” he said.

Already, Catapult is being used to fuel gains in how quickly and accurately the Bing search engine can process search requests. In addition, it is being used to make Microsoft’s Azure the fastest cloud computing platform available. That allows the company to use fewer servers to deliver better results.

By the end of 2016, an artificial intelligence technique called deep neural networks will be deployed on Catapult to help Bing improve its search results. This AI supercomputer in the cloud will increase the speed and efficiency of Microsoft’s data centers – and anyone who uses Bing should notice the difference, too.

“The net effect is you get much more relevant results,” Lanka said.

New research gains
On Monday, the Catapult team released an academic paper providing more detail on how FPGAs are being deployed in Microsoft’s datacenters, including those supporting the Azure cloud, to accelerate processing and networking speeds.

To make data flow faster, they’ve inserted an FPGA between the network and the servers. That can be used to manage traffic going back and forth between the network and server, to communicate directly to other FPGAs or servers or to speed up computation on the local server.

Chiou, who led the Bing FPGA team and now heads up Microsoft Azure’s Cloud Silicon team, said FPGAs used to be relegated to the back room, performing tasks sent to them. Now, the FPGAs are the first to see every message going into the server, enabling them to both make decisions on how to handle each message and perform the work, often without the processor’s involvement.

“What we’ve done now is we’ve made the FPGA the front door,” Chiou said.

Derek Chiou

Derek Chiou (Photography by Scott Eklund/Red Box Pictures)

The Azure team sees this as the first of many ways they’ll use FPGAs, both to make the company’s cloud computing more efficient and to deliver better, more sophisticated services to customers.

“Microsoft is uniquely positioned to deliver innovation like this, in the cloud,” said Mark Russinovich, the chief technology officer for Azure.

Russinovich notes that’s partly because Azure engineers can build on the work that Bing and research engineers are doing. In fact, it was the Bing team’s success that gave him the confidence to jump on the Microsoft FPGA bandwagon.

“We started deploying them in every server knowing that when we were ready to use them, we wouldn’t have to wait,” he said.

That gamble paid off.  Microsoft Azure is now leapfrogging its cloud competition with both speed and efficiency gains.

Real-world application
From the beginning, the team also wanted to build something that could immediately be used in the real world – more specifically, in Microsoft products – rather than in the more utopian setting of a research lab.

“We weren’t building a hammer and looking for a nail to use it with,” Lanka said.

Burger felt lucky to find a partner in Lanka, who he said both understood the long-term vision and was willing to bet on it.

That led to lots of trial and error as the team worked to make their ideas work in a real-world setting – and one that was constantly changing.

Doug Burger (Photography by Scott Eklund/Red Box Pictures)

Doug Burger (Photography by Scott Eklund/Red Box Pictures)

For example, six years ago Lanka said no one could have predicted how big a role deep learning would start to play in everything from sending texts to searching the web.

As the advances in artificial intelligence become more apparent, they started to see how well-suited FPGAs were for that kind of work. That’s because FPGAs are especially good at efficiently doing parallel computing, which is when many computations are carried out simultaneously.

“I can do much better computation in the same or less time,” Lanka said.

The ability to do deep learning more quickly – using that AI supercomputer in the cloud – has broad implications. It could vastly speed up advances in automatic translation, accelerate medical breakthroughs and create automated productivity tools that better anticipate our needs and solve our workday problems.

With FPGAs, Burger said another key advantage is that you can quickly adapt to whatever the next technological breakthrough is, without having to worry too much about whether you anticipated it or not. That’s because you can easily reprogram the FPGAs directly, instead of using less efficient software or waiting as long as a few years to get new hardware.

Even now, Chiou said he thinks people may be underestimating how much potential these systems could have.

“I think a lot of people don’t know what FPGAs are capable of,” Chiou said.

Related:

Allison Linn is a senior writer at Microsoft. Follow her on Twitter.