Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Embodied agents in real-world robots: integrated control and machine learning for intelligent behaviours

View/Open
YuanK_2022.pdf (72.92Mb)
Date
09/06/2022
Author
Yuan, Kai
Metadata
Show full item record
Abstract
The central contribution of this thesis is providing a reliable framework and algorithms to make robots move as versatile and reliable as biological systems. To this end, this work proposes a hierarchical control framework that allows the combination of classical control on the lower levels for control of the robot’s actuators achieving stability and balance control and machine learning on the higher-levels for decision making and planning. Using Machine Learning for the decision making and planning enables the robot to behave more intelligently, dynamically, and animal-like, while the control algorithms for the lower levels maintain the robustness and stability properties of classical control. In particular, this thesis presents contributions in six areas of robotic motion control: (1) biologically motivated implicit hierarchical generative model for autonomous robot operations, (2) multi-expert learning and skill synthesis, (3) improved formulation for Model-Predictive Control (MPC) of legged robots, (4) rapid robot policy development and deployment through fast sample collection for Deep Reinforcement Learning (DRL), (5) reverse-engineering AI policies into an equivalent, safe, transparent, and certifiable controllers, and (6) automatic parameter tuning. First, we draw inspiration from human motor control and insights in Neuroscience and propose an implicit hierarchical generative model for robot control that mimics the deep temporal architecture of human motor control and show that this can be used to fully autonomously learn to complete a complex task of object retrieval, delivery, and navigation. Using the DRL policies trained in the previous works’, we propose a Multi-Expert Learning Architecture (MELA) that allows using multiple experts to train and synthesise new skills. We show that MELA can be used to learn animal-like, dynamic, and adaptive behaviours on the quadruped robot Jueying. We extend MELA, into a more general Multi-Expert Synthesis (MES) framework, where we propose general guidelines for MES. To this end, we propose an automatic state selection procedure, and identify and solve common issues in multi-expert systems. Through MES, we achieve fall-resilient locomotion on the quadruped ANYmal and dualarm cooperation for grasping ungraspable object on bi-manual robot setup using Franka Panda. We propose a numerically stable, Linear-Inverted Pendulum Model (LIPM) based formulation for MPC, and show its robustness properties on the task of legged locomotion for the humanoid Valkyrie. Furthermore, we show that the proposed formulation is more robust to external disturbances than other LIPM formulations for MPC. In the domain of DRL, we propose a fast sample collection procedure on consumer-grade computers through parallelisation that enables the trainining and deployment of policies like trotting on quadruped ANYmal and balancing on humanoid Valkyrie within an hour. Next, we use the Artificial Intelligence (AI) policy trained through DRL as basis for a reverse-engineering framework. For the task of humanoid push recovery on Valkyrie, we show how an opaque AI policy can be used to obtain a transparent, certifiable controller with same properties as the AI policy. We show that ankle, hip, toe, and stepping strategies emerge from the reverse-engineered controller in a unified manner without the need of explicit switching. Lastly, we propose an algorithm called Alternating Bayesian Optimisation (ABO) to automatically tune the high-dimensional parameters for whole-body control of Valkyrie. Contrary to classical Bayesian Optimisation, which scales only up to 10 dimensions, we show that ABO can tune up to 36 parameters for whole-body control and up to 60 dimensions on the global optimisation benchmark COCO.
URI
https://hdl.handle.net/1842/39065

http://dx.doi.org/10.7488/era/2316
Collections
  • Informatics thesis and dissertation collection

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page