Low-resource learning in complex games
Dobre, Mihai Sorin
This project is concerned with learning to take decisions in complex domains, in games in particular. Previous work assumes that massive data resources are available for training, but aside from a few very popular games, this is generally not the case, and the state of the art in such circumstances is to rely extensively on hand-crafted heuristics. On the other hand, human players are able to quickly learn from only a handful of examples, exploiting specific characteristics of the learning problem to accelerate their learning process. Designing algorithms that function in a similar way is an open area of research and has many applications in today’s complex decision problems. One solution presented in this work is design learning algorithms that exploit the inherent structure of the game. Specifically, we take into account how the action space can be clustered into sets called types and exploit this characteristic to improve planning at decision time. Action types can also be leveraged to extract high-level strategies from a sparse corpus of human play, and this generates more realistic trajectories during planning, further improving performance. Another approach that proved successful is using an accurate model of the environment to reduce the complexity of the learning problem. Similar to how human players have an internal model of the world that allows them to focus on the relevant parts of the problem, we decouple learning to win from learning the rules of the game, thereby making supervised learning more data efficient. Finally, in order to handle partial observability that is usually encountered in complex games, we propose an extension to Monte Carlo Tree Search that plans in the Belief Markov Decision Process. We found that this algorithm doesn’t outperform the state of the art models on our chosen domain. Our error analysis indicates that the method struggles to handle the high uncertainty of the conditions required for the game to end. Furthermore, our relaxed belief model can cause rollouts in the belief space to be inaccurate, especially in complex games. We assess the proposed methods in an agent playing the highly complex board game Settlers of Catan. Building on previous research, our strongest agent combines planning at decision time with prior knowledge extracted from an available corpus of general human play; but unlike this prior work, our human corpus consists of only 60 games, as opposed to many thousands. Our agent defeats the current state of the art agent by a large margin, showing that the proposed modifications aid in exploiting general human play in highly complex games.