Iterative methods for solving stochastic optimal control problems
Optimal control problems are inherently hard to solve as the optimization must be performed simultaneously with updating the underlying system. Therefore, most of the time, they have to be solved numerically. In this thesis, we consider two iterative methods for solving stochastic optimal control problems: Howard’s policy improvement algorithm and the method of successive approximations (MSA). Starting from an initial guess, Howard’s policy improvement algorithm separates the step of updating the trajectory of the dynamical system from the optimization and iterations of this should converge to the optimal control. In the discrete space-time setting this is often the case and even rates of convergence are known. In the continuous space-time setting of controlled diffusion the algorithm consists of solving a linear PDE followed by a maximization problem. This has been shown to converge; in some situations, however no global rate is known. The first main contribution is to establish global rate of convergence for the policy improvement algorithm and a variant, called here the gradient iteration algorithm. The second main contribution is the proof of stability of the algorithms under perturbations to both the accuracy of the linear PDE solution and the accuracy of the maximization step. The proof technique is new in this context as it uses the theory of backward stochastic differential equations. The classical MSA is an iterative method for solving stochastic control problems and is derived from Pontryagin’s optimality principle. It is known that the MSA may fail to converge. Using estimates for the backward stochastic differential equation we propose a modification to the MSA algorithm. This modified MSA is shown to converge for general stochastic control problems with control in both the drift and diffusion coefficients. Under some additional assumptions the rate of convergence is shown. The results are valid without restrictions on the time horizon of the control problem, in contrast to iterative methods based on the theory of forward-backward stochastic differential equations. In addition, we study the MSA for solving stochastic control problems with entropy regularization, where the action space is the space of measures. We modify the classical MSA by the relative entropy of two consecutive controls coming from the algorithm. We establish convergence of the algorithm and show how it can be applied for relaxed stochastic control problems.