Show simple item record

dc.contributor.advisorHospedales, Timothyen
dc.contributor.advisorStorkey, Amosen
dc.contributor.advisorVijayakumar, Sethuen
dc.contributor.authorZhao, Chenyangen
dc.date.accessioned2021-03-11T14:58:56Z
dc.date.available2021-03-11T14:58:56Z
dc.date.issued2020-11-30
dc.identifier.urihttps://hdl.handle.net/1842/37519
dc.identifier.urihttp://dx.doi.org/10.7488/era/803
dc.description.abstractA long standing vision of robotics research is to build autonomous systems that can adapt to unforeseen environmental perturbations and learn a set of tasks progressively. Reinforcement learning (RL) has shown great success in a variety of robot control tasks because of recent advances in hardware and learning techniques. To further fulfil this long term goal, generalisation of RL arises as a demanding research topic as it allows learning agents to extract knowledge from past experience and transfer to new situations. This covers generalisation against sampling noise to avoid overfitting, generalisation against environmental changes to avoid domain shift, and generalisation over different but related tasks to achieve lifelong knowledge transfer. This thesis investigates these challenges in the context of RL, with a main focus on cross-domain and cross-task generalisation. We first address the problem of generalisation across domains. With a focus on continuous control tasks, we characterise the sources of uncertainty that may cause generalisation challenges in Deep RL, and provide a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods. In particular, we show that, if generalisation is the goal, then the common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice. Moreover, we evaluate several techniques for improving generalisation and draw conclusions about the most robust techniques to date. From the evaluation, we can see that learning from multiple domains improves generalisation performance across domains. However, aggregating gradient information from different domains may make learning unstable. In the second work, we propose to update the policy to minimise the sum of distances to the new policies learned in each domain in every iteration, measured by Kullback-Leibler (KL) divergence of output (action) distributions. We show that our method improves both the training asymptotic reward and testing policy robustness against domain shifts in a variety of control tasks. We finally investigate generalisation across different classes of control tasks. In particular, we introduce a class of neural network controllers that can realise four distinct tasks: reaching, object throwing, casting, and ball-in-cup. By factorising the weights of the neural network, transferable latent skills are exacted which enable acceleration of learning in cross-task transfer. With a suitable curriculum, this allows us to learn challenging dexterous control tasks like ball-in-cup from scratch with only reinforcement learning.en
dc.language.isoen
dc.publisherThe University of Edinburghen
dc.relation.hasversionZhao, C., Hospedales, T. M., Stulp, F., and Sigaud, O. (2017). Tensor based knowledge transfer across skill categories for robot control. In IJCAI, pages 3462–3468.en
dc.relation.hasversionZhao, C., Siguad, O., Stulp, F., and Hospedales, T. M. (2019). Investigating generalisation in continuous deep reinforcement learning. arXiv preprint arXiv:1902.07015.en
dc.subjectlearning algorithmsen
dc.subjectlearning generalisationen
dc.subjectrobotics researchen
dc.subjectreinforcement learningen
dc.subjectgeneralisation of RLen
dc.subjectcross-domain generalisationen
dc.subjectcross-task generalisationen
dc.titleGeneralisation in deep reinforcement learning with multiple tasks and domainsen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record