Exploiting diversity for efficient machine learning

Geras, Krzysztof Jerzy

Exploiting diversity for efficient machine learning

Simple item page

dc.contributor.advisor

Sutton, Charles

en

dc.contributor.advisor

Storkey, Amos

en

dc.contributor.author

Geras, Krzysztof Jerzy

en

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.date.accessioned

2018-03-16T12:08:49Z

dc.date.available

2018-03-16T12:08:49Z

dc.date.issued

2018-07-02

dc.description.abstract

A common practice for solving machine learning problems is currently to consider each problem in isolation, starting from scratch every time a new learning problem is encountered or a new model is proposed. This is a perfectly feasible solution when the problems are sufficiently easy or, if the problem is hard when a large amount of resources, both in terms of the training data and computation, are available. Although this naive approach has been the main focus of research in machine learning for a few decades and had a lot of success, it becomes infeasible if the problem is too hard in proportion to the available resources. When using a complex model in this naive approach, it is necessary to collect large data sets (if possible at all) to avoid overfitting and hence it is also necessary to use large computational resources to handle the increased amount of data, first during training to process a large data set and then also at test time to execute a complex model. An alternative to this strategy of treating each learning problem independently is to leverage related data sets and computation encapsulated in previously trained models. By doing that we can decrease the amount of data necessary to reach a satisfactory level of performance and, consequently, improve the accuracy achievable and decrease training time. Our attack on this problem is to exploit diversity - in the structure of the data set, in the features learnt and in the inductive biases of different neural network architectures. In the setting of learning from multiple sources we introduce multiple-source cross-validation, which gives an unbiased estimator of the test error when the data set is composed of data coming from multiple sources and the data at test time are coming from a new unseen source. We also propose new estimators of variance of the standard k-fold cross-validation and multiple-source cross-validation, which have lower bias than previously known ones. To improve unsupervised learning we introduce scheduled denoising autoencoders, which learn a more diverse set of features than the standard denoising auto-encoder. This is thanks to their training procedure, which starts with a high level of noise, when the network is learning coarse features and then the noise is lowered gradually, which allows the network to learn some more local features. A connection between this training procedure and curriculum learning is also drawn. We develop further the idea of learning a diverse representation by explicitly incorporating the goal of obtaining a diverse representation into the training objective. The proposed model, the composite denoising autoencoder, learns multiple subsets of features focused on modelling variations in the data set at different levels of granularity. Finally, we introduce the idea of model blending, a variant of model compression, in which the two models, the teacher and the student, are both strong models, but different in their inductive biases. As an example, we train convolutional networks using the guidance of bidirectional long short-term memory (LSTM) networks. This allows to train the convolutional neural network to be more accurate than the LSTM network at no extra cost at test time.

en

dc.identifier.uri

http://hdl.handle.net/1842/28839

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdel rahman Mohamed, Matthai Philipose, and Matthew Richardson. Do deep convolutional nets really need to be deep (or even convolutional)? In ICLR (workshop track), 2016.

en

dc.relation.hasversion

Krzysztof J. Geras and Charles Sutton. Multiple-source cross-validation. In International Conference on Machine Learning, 2013.

en

dc.relation.hasversion

Krzysztof J. Geras and Charles Sutton. Scheduled denoising autoencoders. In International Conference on Learning Representations, 2015.

en

dc.relation.hasversion

Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, and Charles Sutton. Blending lstms into cnns. In International Conference on Learning Representations (workshop track), 2016.

en

dc.relation.hasversion

Krzysztof J. Geras and Charles Sutton. Composite denoising autoencoders. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.

en

dc.subject

machine learning

en

dc.subject

deep learning

en

dc.subject

cross-validation

en

dc.subject

autoencoders

en

dc.subject

convolutional neural networks

en

dc.subject

model compression

en

dc.title

Exploiting diversity for efficient machine learning

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Geras2018.pdf
Size:: 2.85 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection