Integrating local information for inference and optimization in machine learning
dc.contributor.advisor
Storkey, Amos
en
dc.contributor.advisor
Ramamoorthy, Subramanian
en
dc.contributor.author
Zhu, Zhanxing
en
dc.contributor.sponsor
other
en
dc.date.accessioned
2017-03-15T10:59:01Z
dc.date.available
2017-03-15T10:59:01Z
dc.date.issued
2016-06-27
dc.description.abstract
In practice, machine learners often care about two key issues: one is how to obtain a
more accurate answer with limited data, and the other is how to handle large-scale data
(often referred to as “Big Data” in industry) for efficient inference and optimization.
One solution to the first issue might be aggregating learned predictions from diverse
local models. For the second issue, integrating the information from subsets of the
large-scale data is a proven way of achieving computation reduction. In this thesis,
we have developed some novel frameworks and schemes to handle several scenarios
in each of the two salient issues.
For aggregating diverse models – in particular, aggregating probabilistic predictions
from different models – we introduce a spectrum of compositional methods,
Rényi divergence aggregators, which are maximum entropy distributions subject to
biases from individual models, with the Rényi divergence parameter dependent on the
bias. Experiments are implemented on various simulated and real-world datasets to
verify the findings. We also show the theoretical connections between Rényi divergence
aggregators and machine learning markets with isoelastic utilities.
The second issue involves inference and optimization with large-scale data. We
consider two important scenarios: one is optimizing large-scale Convex-Concave Saddle
Point problem with a Separable structure, referred as Sep-CCSP; and the other is large-scale
Bayesian posterior sampling.
Two different settings of Sep-CCSP problem are considered, Sep-CCSP with strongly
convex functions and non-strongly convex functions. We develop efficient stochastic
coordinate descent methods for both of the two cases, which allow fast parallel processing
for large-scale data. Both theoretically and empirically, it is demonstrated that
the developed methods perform comparably, or more often, better than state-of-the-art
methods.
To handle the scalability issue in Bayesian posterior sampling, the stochastic approximation
technique is employed, i.e., only touching a small mini batch of data items
to approximate the full likelihood or its gradient. In order to deal with subsampling error
introduced by stochastic approximation, we propose a covariance-controlled adaptive
Langevin thermostat that can effectively dissipate parameter-dependent noise while
maintaining a desired target distribution. This method achieves a substantial speedup
over popular alternative schemes for large-scale machine learning applications.
en
dc.identifier.uri
http://hdl.handle.net/1842/20980
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Zhu, Z. and Storkey, A.J.(2015). Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems. In Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pages 645-658.
en
dc.relation.hasversion
Zhu, Z. and Storkey, A.J.(2016). Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems. In 30th AAAI Conference on Artificial Intelligence (AAAI 2016),
en
dc.subject
machine learning
en
dc.subject
large-scale optimization
en
dc.subject
large-scale Bayesian sampling
en
dc.title
Integrating local information for inference and optimization in machine learning
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Zhu2016.pdf
- Size:
- 2.25 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

