Edinburgh Research Archive

Composing diverse policies for long-horizon tasks

dc.contributor.advisor
Ramamoorthy, Subramanian
dc.contributor.advisor
Subr, Kartic
dc.contributor.author
Angelov, Daniel Angelov
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2021-11-22T12:01:24Z
dc.date.available
2021-11-22T12:01:24Z
dc.date.issued
2021-11-30
dc.description.abstract
Humans utilise a large diversity of control and reasoning methods to solve different robot manipulation and motion planning tasks. This diversity should be reflected in the strategies used by robots in the same domains. In current practice involving sequential decision making over long horizons, even when the formulation is a hierarchical one, it is common for all elements of this hierarchy to adopt the same representation. For instance, the overall policy might be a switching model over Markov Decision Processes (MDPs) or local feedback control laws. This may not be well suited to a variety of naturally observed behaviours. For instance, when picking up a book from a crowded shelf, we naturally switch between goal-directed reaching, tactile regrasping, sliding the book until it is comfortably off an edge and then once again goal-directed pick and place. It is rare that a single representational form adequately captures this diversity, even in such a seemingly simple task. When the robot must learn or adapt policies from experience, this poses significant challenges. The mis-match between the representational choices and the diversity of task types can result in a significant (sometimes exponential) increase in complexity with respect to time, observation and state-space dimensionality and other attributes. These and other factors can make the learning of such tasks in a “tabula rasa” setting extremely difficult. However, if we were willing to adopt a multi-representational framing of the problem, and allow for some of these constituent modules to be learned in different ways (some from expert demonstration, some by trial and error, and perhaps some being controllers designed from first principles in model-based formulations) then the problem becomes much more tractable. The core hypothesis we explore is that it is possible to devise such learning methods, and that they significantly outperform conventional alternatives on robotic manipulation tasks of interest. In the first part of this thesis, we present a framework for sequentially composing diverse policies facilitating the solution of long-horizon tasks. We rely on demonstrations to provide a quick, not necessarily expert and optimal, way to convey the desired outcome. We model the similarity to demonstrated states in a Goal Scoring Estimator model. We show in a real robot experiment the benefits of diverse policies relying on their own strong inductive biases to efficiently solve different aspects of the task, through sequencing by the Goal Scoring Estimator model. Next, we demonstrate how we can elicit policy structure through causal analysis and task structure through more efficient demonstrations involving interventions. This allows us to alter the manner of execution of a particular policy to match a desired learned user specification. Building a surrogate model of the demonstrator gives us the ability to causally reason about different aspects of the policy and which parts of that policy are salient. We can observe how intervening in the world by placing additional symbols impacts the validity of the original plan. Finally, observing that ‘static’ imitation learning datasets can be limiting if we are aiming to create more robust policies, we present the Learning from Inverse Intervention framework. This allows the robot to simultaneously learn a policy while interacting with the demonstrator. In this interaction, the robot intervenes when there is little information gain and pushes the demonstrator to explore more informative areas even as the demonstration is being performed in real-time. This interaction brings the added benefit of drawing out information about the importance of different regions of the task. We verify the salience by visually inspecting samples from a generative model and by crafting plans that test these hypothetical areas. These methods give us the ability to use demonstrations of a task, to build policies for salient targets, to alter their manner of execution and inspect to understand the causal structure, and to sequence them to solve novel tasks.
en
dc.identifier.uri
https://hdl.handle.net/1842/38302
dc.identifier.uri
http://dx.doi.org/10.7488/era/1568
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
D. Angelov, Y. Hristov, S. Ramamoorthy. From demonstrations to task-space specific ations. Using causal analysis to extract rule parameterization from demonstrations. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), Vol. 34(45), 2020
en
dc.relation.hasversion
D. Angelov, Y. Hristov, M. Burke, S. Ramamoorthy. Composing Diverse Policies for Temporally Extended Tasks. IEEE Robotics and Automation Letters (RA-L), Vol. 5(2), 2020
en
dc.relation.hasversion
D. Angelov, Y. Hristov, S. Ramamoorthy. DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL. In Proc. Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019
en
dc.relation.hasversion
D. Angelov, Y. Hristov, S. Ramamoorthy. Using Causal Analysis to Learn Specifications from Task Demonstrations. In Proc. International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2019.
en
dc.relation.hasversion
D. Angelov, S. Ramamoorthy. Learning from Demonstration of Trajectory Preferences through Causal Modeling and Inference. Robotics: Science and Systems Workshop on Causal Imitation in Robotics (R:SS CIR), 2018
en
dc.relation.hasversion
D. Angelov, S. Ramamoorthy. LfII: Learning from Inverse Intervention during Demonstrations, 2021
en
dc.relation.hasversion
Yordan Hristov, Daniel Angelov, Michael Burke, Alex Lascarides and Sub ramanian Ramamoorthy. ‘Disentangled Relational Representations for Explaining and Learning from Demonstration’. In: Conference on Robot Learning (CoRL). 2019
en
dc.subject
representational choices
en
dc.subject
task type diversity
en
dc.subject
learning method
en
dc.subject
robotic manipulation
en
dc.subject
long-horizon tasks
en
dc.subject
Goal Scoring Estimator model
en
dc.subject
task structure
en
dc.title
Composing diverse policies for long-horizon tasks
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Angelov2021.pdf
Size:
9.04 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)