Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach
dc.contributor.advisor
Willshaw, David
en
dc.contributor.advisor
Hallam, John
en
dc.contributor.author
Smith, Andrew James
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2004-11-24T16:44:17Z
dc.date.available
2004-11-24T16:44:17Z
dc.date.issued
2002-07
dc.description
Institute for Adaptive and Neural Computation
en
dc.description
Award number: 98318242.
en
dc.description.abstract
This thesis is about the dynamic generalisation of continuous action spaces in
reinforcement learning problems.
The standard Reinforcement Learning (RL) account provides a principled and comprehensive
means of optimising a scalar reward signal in a Markov Decision Process.
However, the theory itself does not directly address the imperative issue of generalisation
which naturally arises as a consequence of large or continuous state and action
spaces. A current thrust of research is aimed at fusing the generalisation capabilities
of supervised (and unsupervised) learning techniques with the RL theory. An example
par excellence is Tesauro’s TD-Gammon.
Although much effort has gone into researching ways to represent and generalise over
the input space, much less attention has been paid to the action space. This thesis
first considers the motivation for learning real-valued actions, and then proposes a
set of key properties desirable in any candidate algorithm addressing generalisation
of both input and action spaces. These properties include: Provision of adaptive and
online generalisation, adherence to the standard theory with a central focus on estimating
expected reward, provision for real-valued states and actions, and full support
for a real-valued discounted reward signal. Of particular interest are issues pertaining
to robustness in non-stationary environments, scalability, and efficiency for real-time
learning in applications such as robotics. Since exploring the action space is discovered
to be a potentially costly process, the system should also be flexible enough to
enable maximum reuse of learned actions.
A new approach is proposed which succeeds for the first time in addressing all of the
key issues identified. The algorithm, which is based on the ubiquitous self-organising
map, is analysed and compared with other techniques including those based on the
backpropagation algorithm. The investigation uncovers some important implications
of the differences between these two particular approaches with respect to RL. In particular,
the distributed representation of the multi-layer perceptron is judged to be
something of a double-edged sword offering more sophisticated and more scalable
generalising power, but potentially causing problems in dynamic or non-equiprobable
environments, and tasks involving a highly varying input-output mapping.
The thesis concludes that the self-organising map can be used in conjunction with current
RL theory to provide real-time dynamic representation and generalisation of continuous
action spaces. The proposed model is shown to be reliable in non-stationary,
unpredictable and noisy environments and judged to be unique in addressing and satisfying
a number of desirable properties identified as important to a large class of RL
problems.
en
dc.format.extent
23504610 bytes
en
dc.format.extent
9838903 bytes
en
dc.format.mimetype
application/postscript
en
dc.format.mimetype
application/pdf
en
dc.identifier.uri
http://hdl.handle.net/1842/634
dc.language.iso
en
dc.publisher
University of Edinburgh. College of Science and Engineering. School of Informatics.
en
dc.subject.other
Reinforcement Learning
en
dc.subject.other
Continuous Action Spaces
en
dc.subject.other
Markov Decision Process
en
dc.title
Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
This item appears in the following Collection(s)

