Divide and conquer: How Microsoft researchers used AI to master Ms. Pac-Man

Microsoft researchers have created an artificial intelligence-based system that learned how to get the maximum score on the addictive 1980s video game Ms. Pac-Man, using a divide-and-conquer method that could have broad implications for teaching AI agents to do complex tasks that augment human capabilities.

The team from Maluuba, a Canadian deep learning startup acquired by Microsoft earlier this year, used a branch of AI called reinforcement learning to play the Atari 2600 version of Ms. Pac-Man perfectly. Using that method, the team achieved the maximum score possible of 999,990.

Doina Precup, an associate professor of computer science at McGill University in Montreal said that’s a significant achievement among AI researchers, who have been using various videogames to test their systems but have found Ms. Pac-Man among the most difficult to crack.

But Precup said she was impressed not just with what the researchers achieved but with how they achieved it. To get the high score, the team divided the large problem of mastering Ms. Pac-Man into small pieces, which they then distributed among AI agents.

“This idea of having them work on different pieces to achieve a common goal is very interesting,” Precup said.

She said that’s similar to some theories of how the brain works, and it could have broad implications for teaching AIs to do complex tasks with limited information.

“That would be really, really exciting because it’s another step toward more general intelligence,” she said.

The method, which the Maluuba team calls Hybrid Reward Architecture, used more than 150 agents, each of which worked in parallel with the other agents to master Ms. Pac-Man. For example, some agents got rewarded for successfully finding one specific pellet, while others were tasked with staying out of the way of ghosts.

Then, the researchers created a top agent – sort of like a senior manager at a company – who took suggestions from all the agents and used them to decide where to move Ms. Pac-Man.

Harm Van Seijen

The top agent took into account how many agents advocated for going in a certain direction, but it also looked at the intensity with which they wanted to make that move. For example, if 100 agents wanted to go right because that was the best path to their pellet, but three wanted to go left because there was a deadly ghost to the right, it would give more weight to the ones who had noticed the ghost and go left.

Harm Van Seijen, a research manager with Maluuba who is the lead author of a new paper about the achievement, said the best results were achieved when each agent acted very egotistically – for example, focused only on the best way to get to its pellet – while the top agent decided how to use the information from each agent to make the best move for everyone.

“There’s this nice interplay between how they have to, on the one hand, cooperate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem,” he said. “It benefits the whole.”

Why Ms. Pac-Man?
It may seem strange that it takes some of the most advanced AI research methods to beat something as seemingly simple as a 1980s Atari game. But Rahul Mehrotra, a program manager at Maluuba, said figuring out how to win these types of videogames is actually quite complex, because of the huge variety of situations you can encounter while playing the game.

“A lot of companies working on AI use games to build intelligent algorithms because there’s a lot of human-like intelligence capabilities that you need to beat the games,” Mehrotra said.

Steve Golson

Steve Golson, one of the co-creators of the arcade version of the game, said there was a reason Ms. Pac-Man had to be simple to grasp yet nearly impossible to conquer: It was originally designed for arcade play, and they wanted people to keep dropping those quarters in.

“You want (players to think), ‘Oh, oh, I almost got it! I’m going to try again,’” Golson said. “Ka-ching! Another quarter.”

Golson said he was blown away to learn only a few months ago that Ms. Pac-Man is widely used as a testing ground for AI research.

“It makes me smile,” he said.

But Golson, who now runs the consulting firm Trilobyte Systems, said it makes sense. They intentionally designed Ms. Pac-Man to have less predictability than regular Pac-Man, so it would be harder for players to beat. That’s made it an ideal setting for researchers trying to teach AI agents to react to random environments.

“It’s approachable,” he said, “and yet it has this amazing complexity to it because of the randomness in the gameplay.”

Reinforcement learning
That unpredictability is especially valuable for researchers who are working in the evolving field of reinforcement learning. In AI research, reinforcement learning is the counterpart to supervised learning, a more commonly used method of artificial intelligence in which systems get better at doing something as they are fed more examples of good behavior.

With reinforcement learning, an agent gets positive or negative responses for each action it tries, and learns through trial and error to maximize the positive responses, or rewards.

An AI-based system that uses supervised learning would learn how to come up with a proper response in a conversation by feeding it examples of good and bad responses. A reinforcement learning system, on the other hand, would be expected to learn appropriate responses from only high-level feedback, such as a person saying she enjoyed the conversation–a much more difficult task.

AI experts believe reinforcement learning could be used to create AI agents that can make more decisions on their own, allowing them to do more complex work and freeing up people for even more high-value work.

For example, Mehrotra said the method they developed to beat Ms. Pac-Man could be used to help a company’s sales organization make precise predictions about which potential customers to target at a particular time or on a particular day. The system could use multiple agents, each representing one client, with a top agent weighing factors such as which clients are up for contract renewal, which contracts are worth the most to the company and whether the potential customer is typically in the office that day or available at that time.

That, in turn, would free up the sales executive to focus all her time on trying to make sales, and it would increase the chances that she would land the sale because she’d only be targeting the most receptive clients.

Van Seijen said he also could see this kind of divide-and-conquer approach being used to make advances in other promising areas of AI research, such as natural language processing.

“It really enables us to make further progress in solving these really complex problems,” he said.

Related:

Allison Linn is a senior writer at Microsoft. Follow her on Twitter.