Meta-learning to optimise: loss functions and update rules

Gao, Boyan

Meta-learning to optimise: loss functions and update rules

Simple item page

dc.contributor.advisor

Hospedales, Timothy

dc.contributor.advisor

Bilen, Hakan

dc.contributor.author

Gao, Boyan

dc.date.accessioned

2023-02-07T16:32:47Z

dc.date.available

2023-02-07T16:32:47Z

dc.date.issued

2023-02-07

dc.description.abstract

Meta-learning, aka “learning to learn”, aims to extract invariant meta-knowledge from a group of tasks in order to improve the generalisation of the base models in the novel tasks. The learned meta-knowledge takes various forms, such as neural architecture, network initialization, loss function and optimisers. In this thesis, we study learning to optimise through meta-learning with of main components, loss function learning and optimiser learning. At a high level, those two components play important roles where optimisers provide update rules to modify the model parameters through the gradient information generated from the loss function. We work on the meta-model’s re-usability across tasks. In the ideal case, the learned meta-model should provide a “plug-and-play” drop-in which can be used without further modification or computational expense with any new dataset or even new model architecture. We apply these ideas to address three challenges in machine learning, namely improving the convergence rate of optimisers, learning with noisy labels, and learning models that are robust to domain shift. We first study how to meta-learn loss functions. Unlike most prior work parameterising a loss function in a black-box fashion with neural networks, we meta-learn a Taylor polynomial loss and apply it to improve the robustness of the base model to label noise in the training data. The good performance of deep neural networks relies on gold-stand labelled data. However, in practice, wrongly labelled data is common due to human error and imperfect automatic annotation processes. We draw inspiration from hand-designed losses that modify the training dynamic to reduce the impact of noisy labels. Going beyond existing hand-designed robust losses, we develop a bi-level optimisation meta-learner Automated Robust Loss (ARL) that discovers novel robust losses that outperform the best prior hand-designed robust losses. A second contribution, ITL, extends the loss function learning idea to the problem of Domain Generalisation (DG). DG is the challenging scenario of deploying a model trained on one data distribution to a novel data distribution. Compared to ARL where the target loss function is optimised by a genetic-based algorithm, ITL benefits from gradient-based optimisation of loss parameters. By leveraging the mathematical guarantee from the Implicit Function Theorem, the hypergradient required to update the loss can be efficiently computed without differentiating through the whole base model training trajectory. This reduces the computational cost dramatically in the meta-learning stage and accelerates the loss function learning process by providing a more accurate hypergradient. Applying our learned loss to the DG problem, we are able to learn base models that exhibit increased robustness to domain shift compared to the state-of-theart. Importantly, the modular plug-and-play nature of our learned loss means that it is simple to use, requiring just a few lines of code change to standard Empirical Risk Minimisation (ERM) learners. We finally study accelerating the optimisation process itself by designing a metalearning algorithm that searches for efficient optimisers, which is termed MetaMD. We tackle this problem by meta-learning Mirror Descent-based optimisers through learning the strongly convex function parameterizing a Bregman divergence. While standard meta-learners require a validation set to define a meta-objective for learning, MetaMD instead optimises the convergence rate bound. The resulting learned optimiser uniquely has mathematically guaranteed convergence and generalisation properties.

en

dc.identifier.uri

https://hdl.handle.net/1842/39821

dc.identifier.uri

http://dx.doi.org/10.7488/era/3069

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.subject

Meta-learning

en

dc.subject

Loss Functions

en

dc.subject

Update Rules

en

dc.subject

learning to learn

en

dc.subject

invariant meta-knowledge

en

dc.subject

learned meta-knowledge

en

dc.subject

machine learning

en

dc.subject

meta-learn loss functions

en

dc.subject

parameterising a loss function

en

dc.subject

Taylor polynomial loss

en

dc.subject

Automated Robust Loss

en

dc.subject

ARL

en

dc.subject

Domain Generalisation

en

dc.subject

Implicit Function Theorem,

en

dc.subject

Empirical Risk Minimisation

en

dc.subject

ERM

en

dc.subject

MetaMD

en

dc.subject

Mirror Descent-based optimisers

en

dc.subject

Bregman divergence

en

dc.title

Meta-learning to optimise: loss functions and update rules

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: GaoB_2023.pdf
Size:: 5.7 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection