Integrating local information for inference and optimization in machine learning

Zhu, Zhanxing

Integrating local information for inference and optimization in machine learning

Simple item page

dc.contributor.advisor

Storkey, Amos

en

dc.contributor.advisor

Ramamoorthy, Subramanian

en

dc.contributor.author

Zhu, Zhanxing

en

dc.contributor.sponsor

other

en

dc.date.accessioned

2017-03-15T10:59:01Z

dc.date.available

2017-03-15T10:59:01Z

dc.date.issued

2016-06-27

dc.description.abstract

In practice, machine learners often care about two key issues: one is how to obtain a more accurate answer with limited data, and the other is how to handle large-scale data (often referred to as “Big Data” in industry) for efficient inference and optimization. One solution to the first issue might be aggregating learned predictions from diverse local models. For the second issue, integrating the information from subsets of the large-scale data is a proven way of achieving computation reduction. In this thesis, we have developed some novel frameworks and schemes to handle several scenarios in each of the two salient issues. For aggregating diverse models – in particular, aggregating probabilistic predictions from different models – we introduce a spectrum of compositional methods, Rényi divergence aggregators, which are maximum entropy distributions subject to biases from individual models, with the Rényi divergence parameter dependent on the bias. Experiments are implemented on various simulated and real-world datasets to verify the findings. We also show the theoretical connections between Rényi divergence aggregators and machine learning markets with isoelastic utilities. The second issue involves inference and optimization with large-scale data. We consider two important scenarios: one is optimizing large-scale Convex-Concave Saddle Point problem with a Separable structure, referred as Sep-CCSP; and the other is large-scale Bayesian posterior sampling. Two different settings of Sep-CCSP problem are considered, Sep-CCSP with strongly convex functions and non-strongly convex functions. We develop efficient stochastic coordinate descent methods for both of the two cases, which allow fast parallel processing for large-scale data. Both theoretically and empirically, it is demonstrated that the developed methods perform comparably, or more often, better than state-of-the-art methods. To handle the scalability issue in Bayesian posterior sampling, the stochastic approximation technique is employed, i.e., only touching a small mini batch of data items to approximate the full likelihood or its gradient. In order to deal with subsampling error introduced by stochastic approximation, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. This method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.

en

dc.identifier.uri

http://hdl.handle.net/1842/20980

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Zhu, Z. and Storkey, A.J.(2015). Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems. In Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pages 645-658.

en

dc.relation.hasversion

Zhu, Z. and Storkey, A.J.(2016). Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems. In 30th AAAI Conference on Artificial Intelligence (AAAI 2016),

en

dc.subject

machine learning

en

dc.subject

large-scale optimization

en

dc.subject

large-scale Bayesian sampling

en

dc.title

Integrating local information for inference and optimization in machine learning

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhu2016.pdf
Size:: 2.25 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection