Learning in a State of Confusion: Employing active perception and reinforcement learning in partially observable worlds

Crook, Paul A

Learning in a State of Confusion: Employing active perception and reinforcement learning in partially observable worlds

Simple item page

dc.contributor.advisor

Hayes, Gillian

en

dc.contributor.author

Crook, Paul A

en

dc.date.accessioned

2006-12-04T13:28:55Z

dc.date.available

2006-12-04T13:28:55Z

dc.date.issued

2007-06

dc.description

Institute of Perception, Action and Behaviour

en

dc.description.abstract

In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(l) algorithm (CEQ(l)). We demonstrate on a significant number of problems that CEQ(l) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(l).

en

dc.format.extent

3850523 bytes

en

dc.format.mimetype

application/pdf

en

dc.identifier.uri

http://hdl.handle.net/1842/1471

dc.language.iso

en

dc.publisher

University of Edinburgh. College of Science and Engineering. School of Informatics.

en

dc.relation.hasversion

Paul A. Crook and Gillian Hayes. Active perception in navigation of partially observable grid worlds. In Sixth European Workshop on Reinforcement Learning (EWRL-6), 2003.

en

dc.relation.hasversion

Paul A. Crook and Gillian Hayes. Could active perception aid navigation of partially observable grid worlds? In Proceedings of the Fourteenth European Conference on Machine Learning (ECML 2003), volume 2837 of Lecture Notes in Artificial Intelligence, pages 72-83. Springer-Verlag.

en

dc.relation.hasversion

Paul A. Crook and Gillian Hayes. Learning in a state of confusion: Perceptual aliasing in grid world navigation. In Towards Intelligent Mobile Robots 2003 TIMR 2003), 4th British Conference on (Mobile) Robotics, UWE, Bristol.

en

dc.subject.other

Markov model

en

dc.subject.other

Active Perception

en

dc.subject.other

Reinforcement Learning

en

dc.subject.other

Partially Observable Markov Decision Processes

en

dc.title

Learning in a State of Confusion: Employing active perception and reinforcement learning in partially observable worlds

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Crook_thesis.pdf
Size:: 3.67 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection