Iterative methods for solving stochastic optimal control problems
View/ Open
Date
04/04/2022Author
Kerimkulov, Bekzhan
Metadata
Abstract
Optimal control problems are inherently hard to solve as the optimization must be
performed simultaneously with updating the underlying system. Therefore, most
of the time, they have to be solved numerically. In this thesis, we consider two
iterative methods for solving stochastic optimal control problems: Howard’s policy improvement algorithm and the method of successive approximations (MSA).
Starting from an initial guess, Howard’s policy improvement algorithm separates
the step of updating the trajectory of the dynamical system from the optimization and iterations of this should converge to the optimal control. In the discrete
space-time setting this is often the case and even rates of convergence are known.
In the continuous space-time setting of controlled diffusion the algorithm consists of solving a linear PDE followed by a maximization problem. This has been
shown to converge; in some situations, however no global rate is known. The first
main contribution is to establish global rate of convergence for the policy improvement algorithm and a variant, called here the gradient iteration algorithm.
The second main contribution is the proof of stability of the algorithms under
perturbations to both the accuracy of the linear PDE solution and the accuracy
of the maximization step. The proof technique is new in this context as it uses
the theory of backward stochastic differential equations.
The classical MSA is an iterative method for solving stochastic control problems and is derived from Pontryagin’s optimality principle. It is known that the
MSA may fail to converge. Using estimates for the backward stochastic differential equation we propose a modification to the MSA algorithm. This modified
MSA is shown to converge for general stochastic control problems with control
in both the drift and diffusion coefficients. Under some additional assumptions
the rate of convergence is shown. The results are valid without restrictions on
the time horizon of the control problem, in contrast to iterative methods based
on the theory of forward-backward stochastic differential equations. In addition,
we study the MSA for solving stochastic control problems with entropy regularization, where the action space is the space of measures. We modify the classical
MSA by the relative entropy of two consecutive controls coming from the algorithm. We establish convergence of the algorithm and show how it can be applied
for relaxed stochastic control problems.