Videos are an incredible resource. They take us to places we want to go, expose us to new ideas and help us learn. But sometimes navigating through them is a challenge. In longer videos, pinpointing the segment you want may take more time than you have.
“Other video platforms, they differentiate between content consumers and producers. You’re either watching or creating,” says Ohad Jassin, leader of ILDC Incubations, the team at Microsoft that created Video Breakdown. Based in Israel, they’re mandated to invent technologies for Microsoft, working on new projects at the Microsoft Development Center. “With Video Breakdown, we’re blurring the lines between content producers and consumers.”
Video Breakdown activates a series of services and APIs to extract cognitive insights from the video content. After you upload a video file to the platform, that file’s content undergoes analysis through a series of Microsoft Cognitive Services and Azure Media Analytics as well as other Azure services (Azure Websites, Azure Blob storage, Azure Search, Azure Media Services). The process produces an audio transcript; face tracking, grouping and identification; differentiation of speakers; optical character recognition; and extracts sentiments and topics.
All those pieces combine in an index that enables searching a three-hour long video of a speech or a keynote for the exact moment where “healthcare” was mentioned on speech (using automated transcript) or on screen (using OCR), jump to the first moment when the face of a specific actor or speaker appears, focus on moments with positive sentiment and more.
The platform employs recognition on several fronts: computer vision, speech to text, language understanding, linguistic analysis, text analytics, face tracking and identification and image search.
This could be a useful tool if you’re hosting a conference, or any event with speakers. It’s also a big jump from how you search now, where video content is sorted by channels or metadata – and very subjective manual tagging.
“Most videos are tagged by manual curation, which is error prone, inaccurate and usually involves a level of granularity focusing on the entire video,” Jassin says. “We wanted to do that automatically. Video Breakdown gets linguistic transcripts from audio, detects faces and identifies them – assuming they’re part of top recognized faces, such as celebrities. It’ll tell you who spoke when.”
The platform also aims to be helpful when you’re in search of something specific.
With the index Video Breakdown creates, users can view and search the video gallery that pops up within the platform, explore a specific video in order to better understand it and find the video segments they’re interested in within the source video.
You can search by the platform’s different cognitive features, such as using the face detection API to narrow down sections where a certain person is talking about a certain topic.
Users can become editors by stitching together video segments, creating their own cut or compilation of any video in the gallery, building dynamic playlists from video segments, republishing them as new videos and sharing these to their social channels. Viewers can then play videos using the insights as jump points and controls. This makes finding the moment where a specific person appears or a keyword is mentioned for the first time easier than ever.
The Garage is the outlet for Microsoft teams around the world to get experimental apps and projects out to the public, such as several recently developed by interns, as well as Arrow Launcher, Next Lock Screen, Sprightly, Kaizala, Fetch!, News Pro 2.0 and Hub Keyboard.
“This is for me, one of the best examples of how a company can innovate without the fear of failing. We got the ability to ship this as an official Microsoft solution, as an experimental project,” says Jassin. “We’re innovating a lot of experiences here. We recreated the search experience, video editing, video playing and how they’re shared. There’s nothing to compare it to. So there’s a lot of learning we have to do along the way. We wanted to go publicly and get feedback. Those abilities of video indexing are highly relevant, but we want to make sure it’s battle proven. The Garage allows us to safely experiment with that approach and technology, on a real scale, in a meaningful manner.”
Microsoft News Center Staff