Meta-learning to optimise: loss functions and update rules
dc.contributor.advisor
Hospedales, Timothy
dc.contributor.advisor
Bilen, Hakan
dc.contributor.author
Gao, Boyan
dc.date.accessioned
2023-02-07T16:32:47Z
dc.date.available
2023-02-07T16:32:47Z
dc.date.issued
2023-02-07
dc.description.abstract
Meta-learning, aka “learning to learn”, aims to extract invariant meta-knowledge from a
group of tasks in order to improve the generalisation of the base models in the novel
tasks. The learned meta-knowledge takes various forms, such as neural architecture,
network initialization, loss function and optimisers. In this thesis, we study learning to
optimise through meta-learning with of main components, loss function learning and
optimiser learning. At a high level, those two components play important roles where
optimisers provide update rules to modify the model parameters through the gradient
information generated from the loss function. We work on the meta-model’s re-usability
across tasks. In the ideal case, the learned meta-model should provide a “plug-and-play”
drop-in which can be used without further modification or computational expense with
any new dataset or even new model architecture. We apply these ideas to address three
challenges in machine learning, namely improving the convergence rate of optimisers,
learning with noisy labels, and learning models that are robust to domain shift.
We first study how to meta-learn loss functions. Unlike most prior work parameterising
a loss function in a black-box fashion with neural networks, we meta-learn a Taylor
polynomial loss and apply it to improve the robustness of the base model to label
noise in the training data. The good performance of deep neural networks relies on
gold-stand labelled data. However, in practice, wrongly labelled data is common due
to human error and imperfect automatic annotation processes. We draw inspiration
from hand-designed losses that modify the training dynamic to reduce the impact of
noisy labels. Going beyond existing hand-designed robust losses, we develop a bi-level
optimisation meta-learner Automated Robust Loss (ARL) that discovers novel robust
losses that outperform the best prior hand-designed robust losses.
A second contribution, ITL, extends the loss function learning idea to the problem of
Domain Generalisation (DG). DG is the challenging scenario of deploying a model
trained on one data distribution to a novel data distribution. Compared to ARL where
the target loss function is optimised by a genetic-based algorithm, ITL benefits from
gradient-based optimisation of loss parameters. By leveraging the mathematical guarantee
from the Implicit Function Theorem, the hypergradient required to update the loss
can be efficiently computed without differentiating through the whole base model training
trajectory. This reduces the computational cost dramatically in the meta-learning
stage and accelerates the loss function learning process by providing a more accurate
hypergradient. Applying our learned loss to the DG problem, we are able to learn base
models that exhibit increased robustness to domain shift compared to the state-of-theart.
Importantly, the modular plug-and-play nature of our learned loss means that it
is simple to use, requiring just a few lines of code change to standard Empirical Risk
Minimisation (ERM) learners.
We finally study accelerating the optimisation process itself by designing a metalearning
algorithm that searches for efficient optimisers, which is termed MetaMD. We
tackle this problem by meta-learning Mirror Descent-based optimisers through learning
the strongly convex function parameterizing a Bregman divergence. While standard
meta-learners require a validation set to define a meta-objective for learning, MetaMD
instead optimises the convergence rate bound. The resulting learned optimiser uniquely
has mathematically guaranteed convergence and generalisation properties.
en
dc.identifier.uri
https://hdl.handle.net/1842/39821
dc.identifier.uri
http://dx.doi.org/10.7488/era/3069
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.subject
Meta-learning
en
dc.subject
Loss Functions
en
dc.subject
Update Rules
en
dc.subject
learning to learn
en
dc.subject
invariant meta-knowledge
en
dc.subject
learned meta-knowledge
en
dc.subject
machine learning
en
dc.subject
meta-learn loss functions
en
dc.subject
parameterising a loss function
en
dc.subject
Taylor polynomial loss
en
dc.subject
Automated Robust Loss
en
dc.subject
ARL
en
dc.subject
Domain Generalisation
en
dc.subject
Implicit Function Theorem,
en
dc.subject
Empirical Risk Minimisation
en
dc.subject
ERM
en
dc.subject
MetaMD
en
dc.subject
Mirror Descent-based optimisers
en
dc.subject
Bregman divergence
en
dc.title
Meta-learning to optimise: loss functions and update rules
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- GaoB_2023.pdf
- Size:
- 5.7 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

