A virtual green screen, deep neural networks, and an Azure Kinect sensor (or two)

One of my favorite video games is Age of Empires. I used to play it with coworkers almost two decades ago, and I still play it with my son who beats me with no mercy. One of the magical moments of the game is when you advance the age of your civilization, from feudal to castle to imperial age. Each age brings new technologies that open entirely new ways of driving your strategy to defend from your opponent and win the game.

Sometimes I feel we are in one of those age transitions in real life too. As in Age of Empires, it comes with new technologies that we can use to respond to external disruption and reimagining our strategy. With Covid-19, we find ourselves in our own moment of disruption that has pushed us to rethink how we do things. Events are one of those things.

If you’ve watched this year’s Microsoft Build you’ve probably seen one of the innovation sessions on a “stage.” Spoiler alert – it’s fake. The presenters were all at home safe and sound. Using a background matting process from the University of Washington and our Azure Kinect sensors, we were able to produce presentations that look like we were live. But let me step back for a minute and share how we got here.

Covid-19 has made it impossible for large scale in-person events, so we are all looking for new ways to communicate with our customers. And again, if you participated in any part of Microsoft Build, then you experienced first-hand how our Microsoft Global Event’s team had to shift everything to virtual.

As one of many content owners for Build, we also had to get creative. We took an experimental approach to how we presented our content by using our technology and, by experimenting on the innovation of others, we created a unique way to deliver our sessions on an “virtual stage” at the conference.

YouTube Video

The idea came from two published papers from Adobe Research in 2017 and the University of Washington in 2020 on background matting for video. In basic terms, the approach allows anyone to take a video of themselves and uses AI models to predict the matting around the subject. Essentially the process replaces the background without the need for a green screen.

So, we built on top of the background matting process by using our Azure Kinect sensors with an AI model based on the work from University of Washington to create a new way for our presenters to easily record themselves from their home and appear on our virtual stage.