The art world may not be the first business sector that comes to mind when you think about applications of AI, but a new algorithm developed by Microsoft and MIT is proving to be quite the curator.
Microsoft Research Development Engineer Mark Hamilton , who is also a PhD student at MIT, helped develop the algorithm, which can find similarities in color, texture, theme and meaning between otherwise disparate works of art. The algorithm was recently highlighted by Smithsonian Magazine and several other publications. We chatted with Hamilton to learn more about the project and how it could be broadly applied to other areas.
Blog: Why art?
Mark: I love art, and we had a previous collaboration with the Metropolitan Museum of Art that let you explore an exhibit, understanding similarities between works of art and visualizing the space between various pieces. MosAIc was inspired by an exhibit that paired art from two artists who never met and showed that they have very similar structure. We thought we could do that on a larger scale, and we are continuing to be surprised by some of the connections we are able to find.
Blog: The algorithm can find similar works of art within specific styles or media based on colors, shapes and content, as well as meaning and themes. How can an algorithm take meaning and themes into account?
Mark: Today’s vision algorithms behave a lot like we do. When we look at an image, we get a gut feel for what it contains, such as the objects, people and composition. We train our neural network to understand thousands of objects across millions of different real-world scenes. We then feed that algorithm artworks and capture its ideas, or gut feel, about these works. These neural network ideas have even been shown to be similar to the ideas that humans have about images. It’s these neural network ideas that allow us to compare the content of different works of art.
Blog: What is new and innovative about the MosAIc algorithm?
Mark: One of the new contributions of the work is a new type of algorithm we call a conditional image retrieval system. If you think about something like reverse image search, you put in an image and find all the similar images from the web. What we have done is allow you to find not only the most similar items across the whole collection, but also any sub-collection such as the Egyptian artworks, the prints or even the works by an individual artist. This allows us to find matches across widely different artistic traditions, which is something that regular approaches cannot do efficiently. More technically, we created a data structure that generalizes K-Nearest Neighbor trees to allow them to specialize to particular sub-collections quickly and efficiently.
Blog: In what other ways could it be used?
Mark: At the core of this, you have a new kind of search technology that can be applied to any data. The data can be images like art, products or really anything you want. One example from retail would be a fashion aware search; you could take your favorite pair of pants and use this approach to find the best matching blouse. In the realm of text and documents, let’s say you have an email that is talking about a given topic. Using our approach, you could pull up all of the memos or receipts with similar content as the email. To make these systems, one could just swap out our vision networks with equivalent networks for text, music or other data.
This approach also gives you the ability to add diversity to your search engines in a controlled and structured way. For example, you could imagine showing not only the top results for a restaurant search, but also top results from other sub-categories like vegetarian food, or Black or African American owned restaurants. This way you can supply results that are relevant and highlight many diverse types of content.
Blog: How can people learn more about this project?
Caption: One of the MosAIc pairings. On the left, a British dress from 1840. At right, Vaas van paars glas, Chris Lebeau, c. 1924 – c. 1925.