Edinburgh Research Archive

Generalisation in deep reinforcement learning with multiple tasks and domains

Abstract

A long standing vision of robotics research is to build autonomous systems that can adapt to unforeseen environmental perturbations and learn a set of tasks progressively. Reinforcement learning (RL) has shown great success in a variety of robot control tasks because of recent advances in hardware and learning techniques. To further fulfil this long term goal, generalisation of RL arises as a demanding research topic as it allows learning agents to extract knowledge from past experience and transfer to new situations. This covers generalisation against sampling noise to avoid overfitting, generalisation against environmental changes to avoid domain shift, and generalisation over different but related tasks to achieve lifelong knowledge transfer. This thesis investigates these challenges in the context of RL, with a main focus on cross-domain and cross-task generalisation. We first address the problem of generalisation across domains. With a focus on continuous control tasks, we characterise the sources of uncertainty that may cause generalisation challenges in Deep RL, and provide a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods. In particular, we show that, if generalisation is the goal, then the common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice. Moreover, we evaluate several techniques for improving generalisation and draw conclusions about the most robust techniques to date. From the evaluation, we can see that learning from multiple domains improves generalisation performance across domains. However, aggregating gradient information from different domains may make learning unstable. In the second work, we propose to update the policy to minimise the sum of distances to the new policies learned in each domain in every iteration, measured by Kullback-Leibler (KL) divergence of output (action) distributions. We show that our method improves both the training asymptotic reward and testing policy robustness against domain shifts in a variety of control tasks. We finally investigate generalisation across different classes of control tasks. In particular, we introduce a class of neural network controllers that can realise four distinct tasks: reaching, object throwing, casting, and ball-in-cup. By factorising the weights of the neural network, transferable latent skills are exacted which enable acceleration of learning in cross-task transfer. With a suitable curriculum, this allows us to learn challenging dexterous control tasks like ball-in-cup from scratch with only reinforcement learning.

This item appears in the following Collection(s)