Show simple item record

dc.contributor.advisorRamamoorthy, Subramanian
dc.contributor.advisorRovatsos, Michael
dc.contributor.authorHawasly, Majd
dc.date.accessioned2015-02-16T15:27:50Z
dc.date.available2015-02-16T15:27:50Z
dc.date.issued2014-11-27
dc.identifier.urihttp://hdl.handle.net/1842/9931
dc.description.abstractThis thesis is concerned with policy space abstractions that concisely encode alternative ways of making decisions; dealing with discovery, learning, adaptation and use of these abstractions. This work is motivated by the problem faced by autonomous agents that operate within a domain for long periods of time, hence having to learn to solve many different task instances that share some structural attributes. An example of such a domain is an autonomous robot in a dynamic domestic environment. Such environments raise the need for transfer of knowledge, so as to eliminate the need for long learning trials after deployment. Typically, these tasks would be modelled as sequential decision making problems, including path optimisation for navigation tasks, or Markov Decision Process models for more general tasks. Learning within such models often takes the form of online learning or reinforcement learning. However, handling issues such as knowledge transfer and multiple task instances requires notions of structure and hierarchy, and that raises several questions that form the topic of this thesis – (a) can an agent acquire such hierarchies in policies in an online, incremental manner, (b) can we devise mathematically rigorous ways to abstract policies based on qualitative attributes, (c) when it is inconvenient to employ prolonged trial and error learning, can we devise alternate algorithmic methods for decision making in a lifelong setting? The first contribution of this thesis is an algorithmic method for incrementally acquiring hierarchical policies. Working with the framework of options - temporally extended actions - in reinforcement learning, we present a method for discovering persistent subtasks that define useful options for a particular domain. Our algorithm builds on a probabilistic mixture model in state space to define a generalised and persistent form of ‘bottlenecks’, and suggests suitable policy fragments to make options. In order to continuously update this hierarchy, we devise an incremental process which runs in the background and takes care of proposing and forgetting options. We evaluate this framework in simulated worlds, including the RoboCup 2D simulation league domain. The second contribution of this thesis is in defining abstractions in terms of equivalence classes of trajectories. Utilising recently developed techniques from computational topology, in particular the concept of persistent homology, we show that a library of feasible trajectories could be retracted to representative paths that may be sufficient for reasoning about plans at the abstract level. We present a complete framework, starting from a novel construction of a simplicial complex that describes higher-order connectivity properties of a spatial domain, to methods for computing the homology of this complex at varying resolutions. The resulting abstractions are motion primitives that may be used as topological options, contributing a novel criterion for option discovery. This is validated by experiments in simulated 2D robot navigation, and in manipulation using a physical robot platform. Finally, we develop techniques for solving a family of related, but different, problem instances through policy reuse of a finite policy library acquired over the agent’s lifetime. This represents an alternative approach when traditional methods such as hierarchical reinforcement learning are not computationally feasible. We abstract the policy space using a non-parametric model of performance of policies in multiple task instances, so that decision making is posed as a Bayesian choice regarding what to reuse. This is one approach to transfer learning that is motivated by the needs of practical long-lived systems. We show the merits of such Bayesian policy reuse in simulated real-time interactive systems, including online personalisation and surveillance.en_US
dc.contributor.sponsorDamascus Universityen_US
dc.language.isoenen_US
dc.publisherThe University of Edinburghen_US
dc.relation.hasversionM. Hawasly and S. Ramamoorthy; ‘Task Variability in Autonomous Robots: Offline Learning for Online Performance,’ International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems (ERLARS), 2012.en_US
dc.relation.hasversionM. Hawasly and S. Ramamoorthy; ‘Lifelong learning of structure in the space of policies,’ AAAI Spring Symposium Series on Lifelong Machine Learning, 2013.en_US
dc.relation.hasversionM. Hawasly and S. Ramamoorthy; ‘Lifelong transfer learning with an option hierarchy,’ IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013.en_US
dc.relation.hasversionF.T. Pokorny and M. Hawasly and S. Ramamoorthy; ‘Multiscale topological trajectory classification with persistent homology,’ Robotics: Science and Systems (RSS), 2014.en_US
dc.relation.hasversionM.M.H. Mahmud, M. Hawasly, B. Rosman, and S. Ramamoorthy. Clustering Markov decision processes for continual transfer. arXiv preprint arXiv:1311.3959, 2013.en_US
dc.subjectpolicy space abstractionsen_US
dc.subjectdecision makingen_US
dc.subjectknowledge transferen_US
dc.subjectalgorithmic methoden_US
dc.titlePolicy space abstraction for a lifelong learning agenten_US
dc.typeThesis or Dissertationen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record