Randomized coordinate descent methods for big data optimization

Takac, Martin

Randomized coordinate descent methods for big data optimization

Simple item page

dc.contributor.advisor

Richtarik, Peter

en

dc.contributor.advisor

Gondzio, Jacek

en

dc.contributor.author

Takac, Martin

en

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.contributor.sponsor

Centre for Numerical Algorithms and Intelligent Software

en

dc.contributor.sponsor

Scottish Funding Council

en

dc.date.accessioned

2014-11-18T15:19:26Z

dc.date.available

2014-11-18T15:19:26Z

dc.date.issued

2014-07-01

dc.description.abstract

This thesis consists of 5 chapters. We develop new serial (Chapter 2), parallel (Chapter 3), distributed (Chapter 4) and primal-dual (Chapter 5) stochastic (randomized) coordinate descent methods, analyze their complexity and conduct numerical experiments on synthetic and real data of huge sizes (GBs/TBs of data, millions/billions of variables). In Chapter 2 we develop a randomized coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth separable convex function and prove that it obtains an ε-accurate solution with probability at least 1 - p in at most O((n/ε) log(1/p)) iterations, where n is the number of blocks. This extends recent results of Nesterov [43], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing ε from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving first true iteration complexity bounds. For strongly convex functions the method converges linearly. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Our analysis is also much simpler. In Chapter 3 we show that the randomized coordinate descent method developed in Chapter 2 can be accelerated by parallelization. The speedup, as compared to the serial method, and referring to the number of iterations needed to approximately solve the problem with high probability, is equal to the product of the number of processors and a natural and easily computable measure of separability of the smooth component of the objective function. In the worst case, when no degree of separability is present, there is no speedup; in the best case, when the problem is separable, the speedup is equal to the number of processors. Our analysis also works in the mode when the number of coordinates being updated at each iteration is random, which allows for modeling situations with variable (busy or unreliable) number of processors. We demonstrate numerically that the algorithm is able to solve huge-scale l1-regularized least squares problems with a billion variables. In Chapter 4 we extended coordinate descent into a distributed environment. We initially partition the coordinates (features or examples, based on the problem formulation) and assign each partition to a different node of a cluster. At every iteration, each node picks a random subset of the coordinates from those it owns, independently from the other computers, and in parallel computes and applies updates to the selected coordinates based on a simple closed-form formula. We give bounds on the number of iterations sufficient to approximately solve the problem with high probability, and show how it depends on the data and on the partitioning. We perform numerical experiments with a LASSO instance described by a 3TB matrix. Finally, in Chapter 5, we address the issue of using mini-batches in stochastic optimization of Support Vector Machines (SVMs). We show that the same quantity, the spectral norm of the data, controls the parallelization speedup obtained for both primal stochastic subgradient descent (SGD) and stochastic dual coordinate ascent (SCDA) methods and use it to derive novel variants of mini-batched (parallel) SDCA. Our guarantees for both methods are expressed in terms of the original nonsmooth primal problem based on the hinge-loss. Our results in Chapters 2 and 3 are cast for blocks (groups of coordinates) instead of coordinates, and hence the methods are better described as block coordinate descent methods. While the results in Chapters 4 and 5 are not formulated for blocks, they can be extended to this setting.

en

dc.identifier.uri

http://hdl.handle.net/1842/9670

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Peter Richtarik and Martin Takac. Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function. 4th Workshop on Signal Processing with Adaptive Sparse Structured, Representations, 2011.

en

dc.relation.hasversion

Peter Richtarik and Martin Takac. Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In Diethard Klatte, Hans-Jakob Luthi, and Karl Schmedders, editors, Operations Research Proceedings 2011, pages 27-32. Springer Berlin Heidelberg, 2012.

en

dc.relation.hasversion

Peter Richtarik and Martin Takac. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, Series A, pages 1-38, 2012.

en

dc.relation.hasversion

Peter Richtarik and Martin Takac. Parallel coordinate descent methods for big data optimization. submitted to Mathematical Programming, Series A, arXiv:1212.0873, 2012.

en

dc.relation.hasversion

Peter Richtarik and Martin Takac. Distributed coordinate descent method for learning with big data. arXiv:1310.2059, 2013.

en

dc.relation.hasversion

Peter Richtarik and Martin Takac. On optimal probabilities on stochastic coordinate descent methods. arXiv:1310.3438, 2013.

en

dc.relation.hasversion

Martin Takac, Avleen Singh Bijral, Peter Richtarik, and Nathan Srebro. Mini-batch primal and dual methods for SVMs. International Conference on Machine Learning (ICML), 2013.

en

dc.subject

coordinate descent methods

en

dc.subject

optimization

en

dc.subject

big data

en

dc.title

Randomized coordinate descent methods for big data optimization

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Takac2014.pdf
Size:: 1.9 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Mathematics thesis and dissertation collection