Composing diverse policies for long-horizon tasks

Angelov, Daniel Angelov

Composing diverse policies for long-horizon tasks

Simple item page

dc.contributor.advisor

Ramamoorthy, Subramanian

dc.contributor.advisor

Subr, Kartic

dc.contributor.author

Angelov, Daniel Angelov

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.date.accessioned

2021-11-22T12:01:24Z

dc.date.available

2021-11-22T12:01:24Z

dc.date.issued

2021-11-30

dc.description.abstract

Humans utilise a large diversity of control and reasoning methods to solve different robot manipulation and motion planning tasks. This diversity should be reflected in the strategies used by robots in the same domains. In current practice involving sequential decision making over long horizons, even when the formulation is a hierarchical one, it is common for all elements of this hierarchy to adopt the same representation. For instance, the overall policy might be a switching model over Markov Decision Processes (MDPs) or local feedback control laws. This may not be well suited to a variety of naturally observed behaviours. For instance, when picking up a book from a crowded shelf, we naturally switch between goal-directed reaching, tactile regrasping, sliding the book until it is comfortably off an edge and then once again goal-directed pick and place. It is rare that a single representational form adequately captures this diversity, even in such a seemingly simple task. When the robot must learn or adapt policies from experience, this poses significant challenges. The mis-match between the representational choices and the diversity of task types can result in a significant (sometimes exponential) increase in complexity with respect to time, observation and state-space dimensionality and other attributes. These and other factors can make the learning of such tasks in a “tabula rasa” setting extremely difficult. However, if we were willing to adopt a multi-representational framing of the problem, and allow for some of these constituent modules to be learned in different ways (some from expert demonstration, some by trial and error, and perhaps some being controllers designed from first principles in model-based formulations) then the problem becomes much more tractable. The core hypothesis we explore is that it is possible to devise such learning methods, and that they significantly outperform conventional alternatives on robotic manipulation tasks of interest. In the first part of this thesis, we present a framework for sequentially composing diverse policies facilitating the solution of long-horizon tasks. We rely on demonstrations to provide a quick, not necessarily expert and optimal, way to convey the desired outcome. We model the similarity to demonstrated states in a Goal Scoring Estimator model. We show in a real robot experiment the benefits of diverse policies relying on their own strong inductive biases to efficiently solve different aspects of the task, through sequencing by the Goal Scoring Estimator model. Next, we demonstrate how we can elicit policy structure through causal analysis and task structure through more efficient demonstrations involving interventions. This allows us to alter the manner of execution of a particular policy to match a desired learned user specification. Building a surrogate model of the demonstrator gives us the ability to causally reason about different aspects of the policy and which parts of that policy are salient. We can observe how intervening in the world by placing additional symbols impacts the validity of the original plan. Finally, observing that ‘static’ imitation learning datasets can be limiting if we are aiming to create more robust policies, we present the Learning from Inverse Intervention framework. This allows the robot to simultaneously learn a policy while interacting with the demonstrator. In this interaction, the robot intervenes when there is little information gain and pushes the demonstrator to explore more informative areas even as the demonstration is being performed in real-time. This interaction brings the added benefit of drawing out information about the importance of different regions of the task. We verify the salience by visually inspecting samples from a generative model and by crafting plans that test these hypothetical areas. These methods give us the ability to use demonstrations of a task, to build policies for salient targets, to alter their manner of execution and inspect to understand the causal structure, and to sequence them to solve novel tasks.

en

dc.identifier.uri

https://hdl.handle.net/1842/38302

dc.identifier.uri

http://dx.doi.org/10.7488/era/1568

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

D. Angelov, Y. Hristov, S. Ramamoorthy. From demonstrations to task-space specific ations. Using causal analysis to extract rule parameterization from demonstrations. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), Vol. 34(45), 2020

en

dc.relation.hasversion

D. Angelov, Y. Hristov, M. Burke, S. Ramamoorthy. Composing Diverse Policies for Temporally Extended Tasks. IEEE Robotics and Automation Letters (RA-L), Vol. 5(2), 2020

en

dc.relation.hasversion

D. Angelov, Y. Hristov, S. Ramamoorthy. DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL. In Proc. Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019

en

dc.relation.hasversion

D. Angelov, Y. Hristov, S. Ramamoorthy. Using Causal Analysis to Learn Specifications from Task Demonstrations. In Proc. International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2019.

en

dc.relation.hasversion

D. Angelov, S. Ramamoorthy. Learning from Demonstration of Trajectory Preferences through Causal Modeling and Inference. Robotics: Science and Systems Workshop on Causal Imitation in Robotics (R:SS CIR), 2018

en

dc.relation.hasversion

D. Angelov, S. Ramamoorthy. LfII: Learning from Inverse Intervention during Demonstrations, 2021

en

dc.relation.hasversion

Yordan Hristov, Daniel Angelov, Michael Burke, Alex Lascarides and Sub ramanian Ramamoorthy. ‘Disentangled Relational Representations for Explaining and Learning from Demonstration’. In: Conference on Robot Learning (CoRL). 2019

en

dc.subject

representational choices

en

dc.subject

task type diversity

en

dc.subject

learning method

en

dc.subject

robotic manipulation

en

dc.subject

long-horizon tasks

en

dc.subject

Goal Scoring Estimator model

en

dc.subject

task structure

en

dc.title

Composing diverse policies for long-horizon tasks

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Angelov2021.pdf
Size:: 9.04 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection