Uber Tops Other Research Groups in Pitfall!, Montezuma’s Revenge

It may seem insignificant in the scheme of AI advances, but the machine-learning algorithms developed by Uber’s AI team are far more important than it appears at first glance.

For the first time, machine learning has been able to advance through ’80s-era video games that involve few rewards or clues–but hinge on memory, according to a story in Technology Review. The advances have enabled the AI to reach high scores in Montezuma’s Revenge and Pitfall! after failing and scoring zero for two years. The group’s blog explained how it worked:

“The team’s new family of reinforcement-learning algorithms, dubbed Go-Explore, remember where they have been before, and will return to a particular area or task later on to see if it might help provide better overall results. The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount. This is significant because there may be many real-world situations where you would want an algorithm and a person to work together to solve a hard task.”

Instead of being rewards-oriented, the AI is “motivation” oriented so that reinforcement learning takes place, which is harder than it sounds. Other AI researchers, who have been attempting to crack the games, too, are finally making headway. OpenAI, a nonprofit in San Francisco, created an algorithm that is making progress in Montezuma’s Revenge. A research group at Stanford made “modest” progress on Pitfall! using an approach similar to Uber’s.

A story in VentureBeat.com on the Uber group’s report describes the advances as a “two-phase solution” involving exploration and “robustification.”

“In the exploration phase, Go-Explore builds an archive of different game states — cells — and the various trajectories, or scores, that lead to them. It chooses a cell, returns to that cell, explores the cell, and, for all cells it visits, swaps it in as the trajectory if a given new trajectory is better (i.e., the score is higher).”

The original paper on Uber’s advances can be read at the company’s engineering blog page.

read more at technologyreview.com