dc.contributor.advisor | Ramamoorthy, Subramanian | |
dc.contributor.advisor | Subr, Kartic | |
dc.contributor.author | Angelov, Daniel Angelov | |
dc.date.accessioned | 2021-11-22T12:01:24Z | |
dc.date.available | 2021-11-22T12:01:24Z | |
dc.date.issued | 2021-11-30 | |
dc.identifier.uri | https://hdl.handle.net/1842/38302 | |
dc.identifier.uri | http://dx.doi.org/10.7488/era/1568 | |
dc.description.abstract | Humans utilise a large diversity of control and reasoning methods to solve
different robot manipulation and motion planning tasks. This diversity should be
reflected in the strategies used by robots in the same domains. In current practice
involving sequential decision making over long horizons, even when the formulation
is a hierarchical one, it is common for all elements of this hierarchy to adopt the
same representation. For instance, the overall policy might be a switching model
over Markov Decision Processes (MDPs) or local feedback control laws. This may
not be well suited to a variety of naturally observed behaviours. For instance, when
picking up a book from a crowded shelf, we naturally switch between goal-directed
reaching, tactile regrasping, sliding the book until it is comfortably off an edge and
then once again goal-directed pick and place. It is rare that a single representational
form adequately captures this diversity, even in such a seemingly simple task.
When the robot must learn or adapt policies from experience, this poses significant
challenges. The mis-match between the representational choices and the diversity of
task types can result in a significant (sometimes exponential) increase in complexity
with respect to time, observation and state-space dimensionality and other attributes.
These and other factors can make the learning of such tasks in a “tabula rasa” setting
extremely difficult. However, if we were willing to adopt a multi-representational
framing of the problem, and allow for some of these constituent modules to be
learned in different ways (some from expert demonstration, some by trial and error,
and perhaps some being controllers designed from first principles in model-based
formulations) then the problem becomes much more tractable. The core hypothesis we
explore is that it is possible to devise such learning methods, and that they significantly
outperform conventional alternatives on robotic manipulation tasks of interest.
In the first part of this thesis, we present a framework for sequentially composing
diverse policies facilitating the solution of long-horizon tasks. We rely on demonstrations to provide a quick, not necessarily expert and optimal, way to convey the
desired outcome. We model the similarity to demonstrated states in a Goal Scoring
Estimator model. We show in a real robot experiment the benefits of diverse policies
relying on their own strong inductive biases to efficiently solve different aspects of the
task, through sequencing by the Goal Scoring Estimator model.
Next, we demonstrate how we can elicit policy structure through causal analysis
and task structure through more efficient demonstrations involving interventions. This
allows us to alter the manner of execution of a particular policy to match a desired
learned user specification. Building a surrogate model of the demonstrator gives us
the ability to causally reason about different aspects of the policy and which parts
of that policy are salient. We can observe how intervening in the world by placing
additional symbols impacts the validity of the original plan.
Finally, observing that ‘static’ imitation learning datasets can be limiting if we are
aiming to create more robust policies, we present the Learning from Inverse Intervention
framework. This allows the robot to simultaneously learn a policy while interacting
with the demonstrator. In this interaction, the robot intervenes when there is little
information gain and pushes the demonstrator to explore more informative areas
even as the demonstration is being performed in real-time. This interaction brings the
added benefit of drawing out information about the importance of different regions
of the task. We verify the salience by visually inspecting samples from a generative
model and by crafting plans that test these hypothetical areas.
These methods give us the ability to use demonstrations of a task, to build policies
for salient targets, to alter their manner of execution and inspect to understand the
causal structure, and to sequence them to solve novel tasks. | en |
dc.contributor.sponsor | Engineering and Physical Sciences Research Council (EPSRC) | en |
dc.language.iso | en | en |
dc.publisher | The University of Edinburgh | en |
dc.relation.hasversion | D. Angelov, Y. Hristov, S. Ramamoorthy. From demonstrations to task-space specific ations. Using causal analysis to extract rule parameterization from demonstrations. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), Vol. 34(45), 2020 | en |
dc.relation.hasversion | D. Angelov, Y. Hristov, M. Burke, S. Ramamoorthy. Composing Diverse Policies for Temporally Extended Tasks. IEEE Robotics and Automation Letters (RA-L), Vol. 5(2), 2020 | en |
dc.relation.hasversion | D. Angelov, Y. Hristov, S. Ramamoorthy. DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL. In Proc. Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019 | en |
dc.relation.hasversion | D. Angelov, Y. Hristov, S. Ramamoorthy. Using Causal Analysis to Learn Specifications from Task Demonstrations. In Proc. International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2019. | en |
dc.relation.hasversion | D. Angelov, S. Ramamoorthy. Learning from Demonstration of Trajectory Preferences through Causal Modeling and Inference. Robotics: Science and Systems Workshop on Causal Imitation in Robotics (R:SS CIR), 2018 | en |
dc.relation.hasversion | D. Angelov, S. Ramamoorthy. LfII: Learning from Inverse Intervention during Demonstrations, 2021 | en |
dc.relation.hasversion | Yordan Hristov, Daniel Angelov, Michael Burke, Alex Lascarides and Sub ramanian Ramamoorthy. ‘Disentangled Relational Representations for Explaining and Learning from Demonstration’. In: Conference on Robot Learning (CoRL). 2019 | en |
dc.subject | representational choices | en |
dc.subject | task type diversity | en |
dc.subject | learning method | en |
dc.subject | robotic manipulation | en |
dc.subject | long-horizon tasks | en |
dc.subject | Goal Scoring Estimator model | en |
dc.subject | task structure | en |
dc.title | Composing diverse policies for long-horizon tasks | en |
dc.type | Thesis or Dissertation | en |
dc.type.qualificationlevel | Doctoral | en |
dc.type.qualificationname | PhD Doctor of Philosophy | en |