Randomized iterative methods for linear systems: momentum, inexactness and gossip
View/Open
Date
28/11/2019Author
Loizou, Nicolas
Metadata
Abstract
In the era of big data, one of the key challenges is the development of novel optimization
algorithms that can accommodate vast amounts of data while at the same time satisfying
constraints and limitations of the problem under study. The need to solve optimization problems
is ubiquitous in essentially all quantitative areas of human endeavour, including industry and
science. In the last decade there has been a surge in the demand from practitioners, in fields
such as machine learning, computer vision, artificial intelligence, signal processing and data
science, for new methods able to cope with these new large scale problems.
In this thesis we are focusing on the design, complexity analysis and efficient implementations
of such algorithms. In particular, we are interested in the development of randomized first
order iterative methods for solving large scale linear systems, stochastic quadratic optimization
problems and the distributed average consensus problem.
In Chapter 2, we study several classes of stochastic optimization algorithms enriched with
heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic
Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time
momentum variants of several of these methods are studied. We choose to perform our analysis
in a setting in which all of the above methods are equivalent: convex quadratic problems. We
prove global nonasymptotic linear convergence rates for all methods and various measures of
success, including primal function values, primal iterates, and dual function values. We also
show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense.
This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic
gradient descent method with momentum). Under somewhat weaker conditions, we establish
a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a
novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing
the momentum step. We prove linear convergence of several stochastic methods with stochastic
momentum, and show that in some sparse data regimes and for sufficiently small momentum
parameters, these methods enjoy better overall complexity than methods with deterministic
momentum. Finally, we perform extensive numerical testing on artificial and real datasets.
In Chapter 3, we present a convergence rate analysis of inexact variants of stochastic gradient
descent, stochastic Newton, stochastic proximal point and stochastic subspace ascent.
A common feature of these methods is that in their update rule a certain subproblem needs
to be solved exactly. We relax this requirement by allowing for the subproblem to be solved
inexactly. In particular, we propose and analyze inexact randomized iterative methods for
solving three closely related problems: a convex stochastic quadratic optimization problem, a
best approximation problem and its dual { a concave quadratic maximization problem. We
provide iteration complexity results under several assumptions on the inexactness error. Inexact
variants of many popular and some more exotic methods, including randomized block
Kaczmarz, randomized Gaussian Kaczmarz and randomized block coordinate descent, can be
cast as special cases. Finally, we present numerical experiments which demonstrate the benefits
of allowing inexactness.
When the data describing a given optimization problem is big enough, it becomes impossible
to store it on a single machine. In such situations, it is usually preferable to distribute the data
among the nodes of a cluster or a supercomputer. In one such setting the nodes cooperate
to minimize the sum (or average) of private functions (convex or nonconvex) stored at the
nodes. Among the most popular protocols for solving this problem in a decentralized fashion
(communication is allowed only between neighbours) are randomized gossip algorithms.
In Chapter 4 we propose a new approach for the design and analysis of randomized gossip
algorithms which can be used to solve the distributed average consensus problem, a fundamental
problem in distributed computing, where each node of a network initially holds a number or
vector, and the aim is to calculate the average of these objects by communicating only with
its neighbours (connected nodes). The new approach consists in establishing new connections to
recent literature on randomized iterative methods for solving largescale linear systems. Our
general framework recovers a comprehensive array of wellknown gossip protocols as special
cases and allow for the development of block and arbitrary sampling variants of all of these
methods. In addition, we present novel and provably accelerated randomized gossip protocols
where in each step all nodes of the network update their values using their own information but
only a subset of them exchange messages. The accelerated protocols are the first randomized
gossip algorithms that converge to consensus with a provably accelerated linear rate. The
theoretical results are validated via computational testing on typical wireless sensor network
topologies.
Finally, in Chapter 5, we move towards a different direction and present the first randomized
gossip algorithms for solving the average consensus problem while at the same time protecting
the private values stored at the nodes as these may be sensitive. In particular, we develop
and analyze three privacy preserving variants of the randomized pairwise gossip algorithm
("randomly pick an edge of the network and then replace the values stored at vertices of this
edge by their average") first proposed by Boyd et al. [16] for solving the average consensus
problem. The randomized methods we propose are all dual in nature. That is, they are designed
to solve the dual of the best approximation optimization formulation of the average consensus
problem. We call our three privacy preservation techniques "Binary Oracle", "ε Gap Oracle"
and "Controlled Noise Insertion". We give iteration complexity bounds for the proposed privacy
preserving randomized gossip protocols and perform extensive numerical experiments.
Collections
Related items
Showing items related by title, author, creator and subject.

Stochastic reactiondiffusion models in biology
Smith, Stephen (The University of Edinburgh, 20181129)Every cell contains several millions of diffusing and reacting biological molecules. The interactions between these molecules ultimately manifest themselves in all aspects of life, from the smallest bacterium to the ... 
Stochastic PDEs with extremal properties
Gerencsér, Máté (The University of Edinburgh, 20160629)We consider linear and semilinear stochastic partial differential equations that in some sense can be viewed as being at the "endpoints" of the classical variational theory by Krylov and Rozovskii [25]. In terms of ... 
Stochastic partial differential and integrodifferential equations
Dareiotis, Anastasios Constantinos (The University of Edinburgh, 20151126)In this work we present some new results concerning stochastic partial differential and integrodifferential equations (SPDEs and SPIDEs) that appear in nonlinear filtering. We prove existence and uniqueness of solutions ...