Stochastic, distributed and federated optimization for machine learning
We study optimization algorithms for the finite sum problems frequently arising in machine learning applications. First, we propose novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives. Second, we study distributed setting, in which the data describing the optimization problem does not fit into a single computing node. In this case, traditional methods are inefficient, as the communication costs inherent in distributed optimization become the bottleneck. We propose a communication-efficient framework which iteratively forms local subproblems that can be solved with arbitrary local optimization algorithms. Finally, we introduce the concept of Federated Optimization/Learning, where we try to solve the machine learning problems without having data stored in any centralized manner. The main motivation comes from industry when handling user-generated data. The current prevalent practice is that companies collect vast amounts of user data and store them in datacenters. An alternative we propose is not to collect the data in first place, and instead occasionally use the computational power of users' devices to solve the very same optimization problems, while alleviating privacy concerns at the same time. In such setting, minimization of communication rounds is the primary goal, and we demonstrate that solving the optimization problems in such circumstances is conceptually tractable.