Utilising policy types for effective ad hoc coordination in multiagent systems
View/ Open
Date
26/11/2015Author
Albrecht, Stefano Vittorino
Metadata
Abstract
This thesis is concerned with the ad hoc coordination problem. Therein, the goal is
to design an autonomous agent which can achieve high flexibility and efficiency in a
multiagent system that admits no prior coordination between the designed agent and
the other agents. Flexibility describes the agent’s ability to solve its task with a variety
of other agents in the system; efficiency is the relation between the agent’s payoffs and
time needed to solve the task; and no prior coordination means that the agent does not
a priori know how the other agents behave. This problem is relevant for a number of
practical applications, including human-machine interaction tasks, such as adaptive user
interfaces, robotic elderly care, and automated trading agents.
Motivated by this problem, the central idea studied in this thesis is to utilise a set of
policies, or types, to characterise the behaviour of other agents. Specifically, the idea is
to reduce the complexity of the interaction problem by assuming that the other agents
draw their latent type from some known or hypothesised space of types, and that the
assignment of types is governed by an unknown distribution. Based on the current
interaction history, we can form posterior beliefs about the relative likelihood of types.
These beliefs, combined with the future predictions of the types, can then be used in a
planning procedure to compute optimal responses. The aim of this thesis is to study the
potential and limitations of this idea in the context of ad hoc coordination.
We formulate the ad hoc coordination problem using a game-theoretic model called
the stochastic Bayesian game. Based on this model, we derive a canonical algorithmic
description of the idea outlined above, called Harsanyi-Bellman Ad Hoc Coordination
(HBA). The practical potential of HBA is demonstrated in two case studies, including a
human-machine experiment and a simulated logistics domain. We formulate basic ways
to incorporate evidence (i.e. observed actions) into posterior beliefs and analyse the
conditions under which the posterior beliefs converge to the true distribution of types.
Furthermore, we study the impact of prior beliefs over types (that is, before any actions
are observed) on the long-term performance of HBA, and show empirically that automatic
methods can compute prior beliefs with consistent performance effects. For
hypothesised (i.e. “guessed”) type spaces, we analyse the relations between hypothesised
and true type spaces under which HBA is still guaranteed to solve its task, despite
inaccuracies in hypothesised types. Finally, we show how HBA can perform an automatic
statistical analysis to decide whether to reject its behavioural hypothesis, i.e. the
combination of posterior beliefs and types.