Edinburgh Research Archive

Exploiting diversity for efficient machine learning

dc.contributor.advisor
Sutton, Charles
en
dc.contributor.advisor
Storkey, Amos
en
dc.contributor.author
Geras, Krzysztof Jerzy
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2018-03-16T12:08:49Z
dc.date.available
2018-03-16T12:08:49Z
dc.date.issued
2018-07-02
dc.description.abstract
A common practice for solving machine learning problems is currently to consider each problem in isolation, starting from scratch every time a new learning problem is encountered or a new model is proposed. This is a perfectly feasible solution when the problems are sufficiently easy or, if the problem is hard when a large amount of resources, both in terms of the training data and computation, are available. Although this naive approach has been the main focus of research in machine learning for a few decades and had a lot of success, it becomes infeasible if the problem is too hard in proportion to the available resources. When using a complex model in this naive approach, it is necessary to collect large data sets (if possible at all) to avoid overfitting and hence it is also necessary to use large computational resources to handle the increased amount of data, first during training to process a large data set and then also at test time to execute a complex model. An alternative to this strategy of treating each learning problem independently is to leverage related data sets and computation encapsulated in previously trained models. By doing that we can decrease the amount of data necessary to reach a satisfactory level of performance and, consequently, improve the accuracy achievable and decrease training time. Our attack on this problem is to exploit diversity - in the structure of the data set, in the features learnt and in the inductive biases of different neural network architectures. In the setting of learning from multiple sources we introduce multiple-source cross-validation, which gives an unbiased estimator of the test error when the data set is composed of data coming from multiple sources and the data at test time are coming from a new unseen source. We also propose new estimators of variance of the standard k-fold cross-validation and multiple-source cross-validation, which have lower bias than previously known ones. To improve unsupervised learning we introduce scheduled denoising autoencoders, which learn a more diverse set of features than the standard denoising auto-encoder. This is thanks to their training procedure, which starts with a high level of noise, when the network is learning coarse features and then the noise is lowered gradually, which allows the network to learn some more local features. A connection between this training procedure and curriculum learning is also drawn. We develop further the idea of learning a diverse representation by explicitly incorporating the goal of obtaining a diverse representation into the training objective. The proposed model, the composite denoising autoencoder, learns multiple subsets of features focused on modelling variations in the data set at different levels of granularity. Finally, we introduce the idea of model blending, a variant of model compression, in which the two models, the teacher and the student, are both strong models, but different in their inductive biases. As an example, we train convolutional networks using the guidance of bidirectional long short-term memory (LSTM) networks. This allows to train the convolutional neural network to be more accurate than the LSTM network at no extra cost at test time.
en
dc.identifier.uri
http://hdl.handle.net/1842/28839
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdel rahman Mohamed, Matthai Philipose, and Matthew Richardson. Do deep convolutional nets really need to be deep (or even convolutional)? In ICLR (workshop track), 2016.
en
dc.relation.hasversion
Krzysztof J. Geras and Charles Sutton. Multiple-source cross-validation. In International Conference on Machine Learning, 2013.
en
dc.relation.hasversion
Krzysztof J. Geras and Charles Sutton. Scheduled denoising autoencoders. In International Conference on Learning Representations, 2015.
en
dc.relation.hasversion
Krzysztof J. Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, and Charles Sutton. Blending lstms into cnns. In International Conference on Learning Representations (workshop track), 2016.
en
dc.relation.hasversion
Krzysztof J. Geras and Charles Sutton. Composite denoising autoencoders. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.
en
dc.subject
machine learning
en
dc.subject
deep learning
en
dc.subject
cross-validation
en
dc.subject
autoencoders
en
dc.subject
convolutional neural networks
en
dc.subject
model compression
en
dc.title
Exploiting diversity for efficient machine learning
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Geras2018.pdf
Size:
2.85 MB
Format:
Adobe Portable Document Format

This item appears in the following Collection(s)