Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach

View/Open
2001-andys.ps (22.41Mb)
2001_andys.pdf (9.383Mb)
Date
07/2002
Author
Smith, Andrew James
Metadata
Show full item record
Abstract
This thesis is about the dynamic generalisation of continuous action spaces in reinforcement learning problems. The standard Reinforcement Learning (RL) account provides a principled and comprehensive means of optimising a scalar reward signal in a Markov Decision Process. However, the theory itself does not directly address the imperative issue of generalisation which naturally arises as a consequence of large or continuous state and action spaces. A current thrust of research is aimed at fusing the generalisation capabilities of supervised (and unsupervised) learning techniques with the RL theory. An example par excellence is Tesauro’s TD-Gammon. Although much effort has gone into researching ways to represent and generalise over the input space, much less attention has been paid to the action space. This thesis first considers the motivation for learning real-valued actions, and then proposes a set of key properties desirable in any candidate algorithm addressing generalisation of both input and action spaces. These properties include: Provision of adaptive and online generalisation, adherence to the standard theory with a central focus on estimating expected reward, provision for real-valued states and actions, and full support for a real-valued discounted reward signal. Of particular interest are issues pertaining to robustness in non-stationary environments, scalability, and efficiency for real-time learning in applications such as robotics. Since exploring the action space is discovered to be a potentially costly process, the system should also be flexible enough to enable maximum reuse of learned actions. A new approach is proposed which succeeds for the first time in addressing all of the key issues identified. The algorithm, which is based on the ubiquitous self-organising map, is analysed and compared with other techniques including those based on the backpropagation algorithm. The investigation uncovers some important implications of the differences between these two particular approaches with respect to RL. In particular, the distributed representation of the multi-layer perceptron is judged to be something of a double-edged sword offering more sophisticated and more scalable generalising power, but potentially causing problems in dynamic or non-equiprobable environments, and tasks involving a highly varying input-output mapping. The thesis concludes that the self-organising map can be used in conjunction with current RL theory to provide real-time dynamic representation and generalisation of continuous action spaces. The proposed model is shown to be reliable in non-stationary, unpredictable and noisy environments and judged to be unique in addressing and satisfying a number of desirable properties identified as important to a large class of RL problems.
URI
http://hdl.handle.net/1842/634
Collections
  • Informatics thesis and dissertation collection

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page