The secret sauce behind Kinect

There have been plenty of behind the scenes Kinect stories, often focused on the history of the project, but this one is different. The Xbox Engineering blog posted a lengthy entry at the end of last year detailing how the human tracking works as well as how the gesture based interface for the dashboard works. It’s a fascinating read that explains how different programming for analog (humans) is from programming for digital (Master Chief in Halo). When you start to think of it that way, you realize there really is a good deal of software magic inside the sensor. The next person who says to me “oh Kinect is really just a depth sensing camera tech you licensed from PrimeSense will find themselves pointed to this post. When you’re dealing with variables at the level of 10^23, you need something more than just a smart camera – no disrespect to our friends in Israel. Kinect can actively track the full skeletons of up to two human players as well as passively track the shape and position of four passive players at once

It’s this software magic that makes Kinect more human than you may expect is at the heart of the creation of a natural user interface. Machine learning plays a key part of this process and it’s something that is elegantly explained in the post. The same technology is starting to show up in several other projects albeit with vastly different uses – Project Emporia for example uses machine learning to figure out what news you’re interested in.

I’ll stop there as I don’t want to give away any more of this post – frankly it’s one of the best blog posts I have ever read from Microsoft as it explains deeply technical stuff in a way most people can understand. It’s wonderfully written and fascinating at the same time.

Enjoy.