Show simple item record

dc.contributor.advisorRamamoorthy, Subramanian
dc.contributor.advisorSubr, Kartic
dc.contributor.authorAngelov, Daniel Angelov
dc.date.accessioned2021-11-22T12:01:24Z
dc.date.available2021-11-22T12:01:24Z
dc.date.issued2021-11-30
dc.identifier.urihttps://hdl.handle.net/1842/38302
dc.identifier.urihttp://dx.doi.org/10.7488/era/1568
dc.description.abstractHumans utilise a large diversity of control and reasoning methods to solve different robot manipulation and motion planning tasks. This diversity should be reflected in the strategies used by robots in the same domains. In current practice involving sequential decision making over long horizons, even when the formulation is a hierarchical one, it is common for all elements of this hierarchy to adopt the same representation. For instance, the overall policy might be a switching model over Markov Decision Processes (MDPs) or local feedback control laws. This may not be well suited to a variety of naturally observed behaviours. For instance, when picking up a book from a crowded shelf, we naturally switch between goal-directed reaching, tactile regrasping, sliding the book until it is comfortably off an edge and then once again goal-directed pick and place. It is rare that a single representational form adequately captures this diversity, even in such a seemingly simple task. When the robot must learn or adapt policies from experience, this poses significant challenges. The mis-match between the representational choices and the diversity of task types can result in a significant (sometimes exponential) increase in complexity with respect to time, observation and state-space dimensionality and other attributes. These and other factors can make the learning of such tasks in a “tabula rasa” setting extremely difficult. However, if we were willing to adopt a multi-representational framing of the problem, and allow for some of these constituent modules to be learned in different ways (some from expert demonstration, some by trial and error, and perhaps some being controllers designed from first principles in model-based formulations) then the problem becomes much more tractable. The core hypothesis we explore is that it is possible to devise such learning methods, and that they significantly outperform conventional alternatives on robotic manipulation tasks of interest. In the first part of this thesis, we present a framework for sequentially composing diverse policies facilitating the solution of long-horizon tasks. We rely on demonstrations to provide a quick, not necessarily expert and optimal, way to convey the desired outcome. We model the similarity to demonstrated states in a Goal Scoring Estimator model. We show in a real robot experiment the benefits of diverse policies relying on their own strong inductive biases to efficiently solve different aspects of the task, through sequencing by the Goal Scoring Estimator model. Next, we demonstrate how we can elicit policy structure through causal analysis and task structure through more efficient demonstrations involving interventions. This allows us to alter the manner of execution of a particular policy to match a desired learned user specification. Building a surrogate model of the demonstrator gives us the ability to causally reason about different aspects of the policy and which parts of that policy are salient. We can observe how intervening in the world by placing additional symbols impacts the validity of the original plan. Finally, observing that ‘static’ imitation learning datasets can be limiting if we are aiming to create more robust policies, we present the Learning from Inverse Intervention framework. This allows the robot to simultaneously learn a policy while interacting with the demonstrator. In this interaction, the robot intervenes when there is little information gain and pushes the demonstrator to explore more informative areas even as the demonstration is being performed in real-time. This interaction brings the added benefit of drawing out information about the importance of different regions of the task. We verify the salience by visually inspecting samples from a generative model and by crafting plans that test these hypothetical areas. These methods give us the ability to use demonstrations of a task, to build policies for salient targets, to alter their manner of execution and inspect to understand the causal structure, and to sequence them to solve novel tasks.en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionD. Angelov, Y. Hristov, S. Ramamoorthy. From demonstrations to task-space specific ations. Using causal analysis to extract rule parameterization from demonstrations. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), Vol. 34(45), 2020en
dc.relation.hasversionD. Angelov, Y. Hristov, M. Burke, S. Ramamoorthy. Composing Diverse Policies for Temporally Extended Tasks. IEEE Robotics and Automation Letters (RA-L), Vol. 5(2), 2020en
dc.relation.hasversionD. Angelov, Y. Hristov, S. Ramamoorthy. DynoPlan: Combining Motion Planning and Deep Neural Network based Controllers for Safe HRL. In Proc. Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019en
dc.relation.hasversionD. Angelov, Y. Hristov, S. Ramamoorthy. Using Causal Analysis to Learn Specifications from Task Demonstrations. In Proc. International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2019.en
dc.relation.hasversionD. Angelov, S. Ramamoorthy. Learning from Demonstration of Trajectory Preferences through Causal Modeling and Inference. Robotics: Science and Systems Workshop on Causal Imitation in Robotics (R:SS CIR), 2018en
dc.relation.hasversionD. Angelov, S. Ramamoorthy. LfII: Learning from Inverse Intervention during Demonstrations, 2021en
dc.relation.hasversionYordan Hristov, Daniel Angelov, Michael Burke, Alex Lascarides and Sub ramanian Ramamoorthy. ‘Disentangled Relational Representations for Explaining and Learning from Demonstration’. In: Conference on Robot Learning (CoRL). 2019en
dc.subjectrepresentational choicesen
dc.subjecttask type diversityen
dc.subjectlearning methoden
dc.subjectrobotic manipulationen
dc.subjectlong-horizon tasksen
dc.subjectGoal Scoring Estimator modelen
dc.subjecttask structureen
dc.titleComposing diverse policies for long-horizon tasksen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record