Exploiting diversity for efficient machine learning
Files
Item Status
Embargo End Date
Date
Authors
Abstract
A common practice for solving machine learning problems is currently to consider
each problem in isolation, starting from scratch every time a new learning problem
is encountered or a new model is proposed. This is a perfectly feasible solution
when the problems are sufficiently easy or, if the problem is hard when a large
amount of resources, both in terms of the training data and computation, are
available. Although this naive approach has been the main focus of research in
machine learning for a few decades and had a lot of success, it becomes infeasible
if the problem is too hard in proportion to the available resources. When using
a complex model in this naive approach, it is necessary to collect large data
sets (if possible at all) to avoid overfitting and hence it is also necessary to use
large computational resources to handle the increased amount of data, first during
training to process a large data set and then also at test time to execute a complex
model.
An alternative to this strategy of treating each learning problem independently
is to leverage related data sets and computation encapsulated in previously
trained models. By doing that we can decrease the amount of data necessary to
reach a satisfactory level of performance and, consequently, improve the accuracy
achievable and decrease training time. Our attack on this problem is to exploit
diversity - in the structure of the data set, in the features learnt and in the
inductive biases of different neural network architectures.
In the setting of learning from multiple sources we introduce multiple-source
cross-validation, which gives an unbiased estimator of the test error when the data
set is composed of data coming from multiple sources and the data at test time are
coming from a new unseen source. We also propose new estimators of variance of
the standard k-fold cross-validation and multiple-source cross-validation, which
have lower bias than previously known ones.
To improve unsupervised learning we introduce scheduled denoising autoencoders,
which learn a more diverse set of features than the standard denoising
auto-encoder. This is thanks to their training procedure, which starts with a
high level of noise, when the network is learning coarse features and then the
noise is lowered gradually, which allows the network to learn some more local
features. A connection between this training procedure and curriculum learning
is also drawn. We develop further the idea of learning a diverse representation
by explicitly incorporating the goal of obtaining a diverse representation into the
training objective. The proposed model, the composite denoising autoencoder,
learns multiple subsets of features focused on modelling variations in the data set
at different levels of granularity.
Finally, we introduce the idea of model blending, a variant of model compression,
in which the two models, the teacher and the student, are both strong
models, but different in their inductive biases. As an example, we train convolutional
networks using the guidance of bidirectional long short-term memory
(LSTM) networks. This allows to train the convolutional neural network to be
more accurate than the LSTM network at no extra cost at test time.
This item appears in the following Collection(s)

