Stochastic optimal control with learned dynamics models
The motor control of anthropomorphic robotic systems is a challenging computational task mainly because of the high levels of redundancies such systems exhibit. Optimality principles provide a general strategy to resolve such redundancies in a task driven fashion. In particular closed loop optimisation, i.e., optimal feedback control (OFC), has served as a successful motor control model as it unifies important concepts such as costs, noise, sensory feedback and internal models into a coherent mathematical framework. Realising OFC on realistic anthropomorphic systems however is non-trivial: Firstly, such systems have typically large dimensionality and nonlinear dynamics, in which case the optimisation problem becomes computationally intractable. Approximative methods, like the iterative linear quadratic gaussian (ILQG), have been proposed to avoid this, however the transfer of solutions from idealised simulations to real hardware systems has proved to be challenging. Secondly, OFC relies on an accurate description of the system dynamics, which for many realistic control systems may be unknown, difficult to estimate, or subject to frequent systematic changes. Thirdly, many (especially biologically inspired) systems suffer from significant state or control dependent sources of noise, which are difficult to model in a generally valid fashion. This thesis addresses these issues with the aim to realise efficient OFC for anthropomorphic manipulators. First we investigate the implementation of OFC laws on anthropomorphic hardware. Using ILQG we optimally control a high-dimensional anthropomorphic manipulator without having to specify an explicit inverse kinematics, inverse dynamics or feedback control law. We achieve this by introducing a novel cost function that accounts for the physical constraints of the robot and a dynamics formulation that resolves discontinuities in the dynamics. The experimental hardware results reveal the benefits of OFC over traditional (open loop) optimal controllers in terms of energy efficiency and compliance, properties that are crucial for the control of modern anthropomorphic manipulators. We then propose a new framework of OFC with learned dynamics (OFC-LD) that, unlike classic approaches, does not rely on analytic dynamics functions but rather updates the internal dynamics model continuously from sensorimotor plant feedback. We demonstrate how this approach can compensate for unknown dynamics and for complex dynamic perturbations in an online fashion. A specific advantage of a learned dynamics model is that it contains the stochastic information (i.e., noise) from the plant data, which corresponds to the uncertainty in the system. Consequently one can exploit this information within OFC-LD in order to produce control laws that minimise the uncertainty in the system. In the domain of antagonistically actuated systems this approach leads to improved motor performance, which is achieved by co-contracting antagonistic actuators in order to reduce the negative effects of the noise. Most importantly the shape and source of the noise is unknown a priory and is solely learned from plant data. The model is successfully tested on an antagonistic series elastic actuator (SEA) that we have built for this purpose. The proposed OFC-LD model is not only applicable to robotic systems but also proves to be very useful in the modelling of biological motor control phenomena and we show how our model can be used to predict a wide range of human impedance control patterns during both, stationary and adaptation tasks.