Sketch and project: randomized iterative methods for linear systems and inverting matrices
Gower, Robert Mansel
Probabilistic ideas and tools have recently begun to permeate into several fields where they had traditionally not played a major role, including fields such as numerical linear algebra and optimization. One of the key ways in which these ideas influence these fields is via the development and analysis of randomized algorithms for solving standard and new problems of these fields. Such methods are typically easier to analyze, and often lead to faster and/or more scalable and versatile methods in practice. This thesis explores the design and analysis of new randomized iterative methods for solving linear systems and inverting matrices. The methods are based on a novel sketch-and-project framework. By sketching we mean, to start with a difficult problem and then randomly generate a simple problem that contains all the solutions of the original problem. After sketching the problem, we calculate the next iterate by projecting our current iterate onto the solution space of the sketched problem. The starting point for this thesis is the development of an archetype randomized method for solving linear systems. Our method has six different but equivalent interpretations: sketch-and-project, constrain-and-approximate, random intersect, random linear solve, random update and random fixed point. By varying its two parameters – a positive definite matrix (defining geometry), and a random matrix (sampled in an i.i.d. fashion in each iteration) – we recover a comprehensive array of well known algorithms as special cases, including the randomized Kaczmarz method, randomized Newton method, randomized coordinate descent method and random Gaussian pursuit. We also naturally obtain variants of all these methods using blocks and importance sampling. However, our method allows for a much wider selection of these two parameters, which leads to a number of new specific methods. We prove exponential convergence of the expected norm of the error in a single theorem, from which existing complexity results for known variants can be obtained. However, we also give an exact formula for the evolution of the expected iterates, which allows us to give lower bounds on the convergence rate. We then extend our problem to that of finding the projection of given vector onto the solution space of a linear system. For this we develop a new randomized iterative algorithm: stochastic dual ascent (SDA). The method is dual in nature, and iteratively solves the dual of the projection problem. The dual problem is a non-strongly concave quadratic maximization problem without constraints. In each iteration of SDA, a dual variable is updated by a carefully chosen point in a subspace spanned by the columns of a random matrix drawn independently from a fixed distribution. The distribution plays the role of a parameter of the method. Our complexity results hold for a wide family of distributions of random matrices, which opens the possibility to fine-tune the stochasticity of the method to particular applications. We prove that primal iterates associated with the dual process converge to the projection exponentially fast in expectation, and give a formula and an insightful lower bound for the convergence rate. We also prove that the same rate applies to dual function values, primal function values and the duality gap. Unlike traditional iterative methods, SDA converges under virtually no additional assumptions on the system (e.g., rank, diagonal dominance) beyond consistency. In fact, our lower bound improves as the rank of the system matrix drops. By mapping our dual algorithm to a primal process, we uncover that the SDA method is the dual method with respect to the sketch-and-project method from the previous chapter. Thus our new more general convergence results for SDA carry over to the sketch-and-project method and all its specializations (randomized Kaczmarz, randomized coordinate descent...etc). When our method specializes to a known algorithm, we either recover the best known rates, or improve upon them. Finally, we show that the framework can be applied to the distributed average consensus problem to obtain an array of new algorithms. The randomized gossip algorithm arises as a special case. In the final chapter, we extend our method for solving linear system to inverting matrices, and develop a family of methods with specialized variants that maintain symmetry or positive definiteness of the iterates. All the methods in the family converge globally and exponentially, with explicit rates. In special cases, we obtain stochastic block variants of several quasi-Newton updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB), Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). Ours are the first stochastic versions of these updates shown to converge to an inverse of a fixed matrix. Through a dual viewpoint we uncover a fundamental link between quasi-Newton updates and approximate inverse preconditioning. Further, we develop an adaptive variant of the randomized block BFGS (AdaRBFGS), where we modify the distribution underlying the stochasticity of the method throughout the iterative process to achieve faster convergence. By inverting several matrices from varied applications, we demonstrate that AdaRBFGS is highly competitive when compared to the well established Newton-Schulz and approximate preconditioning methods. In particular, on large-scale problems our method outperforms the standard methods by orders of magnitude. The development of efficient methods for estimating the inverse of very large matrices is a much needed tool for preconditioning and variable metric methods in the big data era.