Parameter-efficient transfer learning for pre-trained transformers

Cooper Stickland, Asa

Parameter-efficient transfer learning for pre-trained transformers

Simple item page

dc.contributor.advisor

Murray, Iain

dc.contributor.advisor

Titov, Ivan

dc.contributor.advisor

Hospedales, Timothy

dc.contributor.author

Cooper Stickland, Asa

dc.date.accessioned

2024-10-02T09:30:57Z

dc.date.available

2024-10-02T09:30:57Z

dc.date.issued

2024-10-02

dc.description.abstract

In this thesis I will tackle a problem in machine learning and natural language processing (NLP) that I will refer to as ‘parameter efficient transfer learning’. This involves taking ‘general-purpose’, large-scale models trained on huge amounts of data, and specializing them to a particular task, without changing the underlying model that much. A recent paradigm in machine learning has been to do large scale ‘pre-training’ of a model on unsupervised data before specializing to a particular task. Typically this means ‘full fine-tuning’ of pre-trained models by updating every parameter of the pre-trained model on the new task. In this thesis we consider an alternative approach to full fine-tuning where we only update a subset of (or small number of additional) pre-trained model parameters, hence the term ‘parameter-efficient’ transfer learning, which can save on computation and storage space, unlock new capabilities, and in some situations outperform fine-tuning every parameter. In the first section we consider parameter-efficient transfer learning on English classification tasks. Our first contribution is an approach to fine-tuning pre-trained models on multiple tasks simultaneously. Typical approaches underperform task-specific models due to a lack of capacity and interference between tasks. Our contribution is an approach to ‘multi-task’ learning where we introduce small task-specific modules for each task, which enable us to achieve the same performance as task-specific fine-tuning with only a fraction of the parameters. This initial exploration was done on relatively small models compared to the current state- of-the-art, and did not cover a popular approach of freezing pre-trained model parameters and only training the small modules. In the second half of this section we address these limitations, contributing a survey of parameter-efficient approaches, showing which parameter- efficient architectures work the best as model scale increases, and detailing trade-offs between performance, memory-efficiency and other factors. In the second section we consider applying parameter-efficient transfer learning approach to machine translation (MT), which involves modeling sequences rather than class labels and is multilingual rather than English-only. This means approaches designed for English classification can underperform. We explore adapting systems that have only been trained on an unsupervised objective (involving multilingual text but not machine translation) to the MT task. We were the first to apply parameter-efficient techniques to this problem. We explore which parts of the transformer sequence-to-sequence architecture are important to adapt, and what percentage of the original model we need to update to match fine-tuning every parameter. In further chapters we contribute a new approach where we train independent ‘adapters’ (a particular parameter-efficient architecture) for source language, target language and ‘domain’ (i.e. legal text), allowing us to compose them in ways not seen during training. Finally, we contribute an extensive series of experiments on what matters for the performance of parameter-efficient methods on machine translation.

en

dc.identifier.uri

https://hdl.handle.net/1842/42243

dc.identifier.uri

http://dx.doi.org/10.7488/era/4963

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Cooper Stickland, A. Berard, and V. Nikoulina. Multilingual domain adaptation for NMT: Decoupling language and domain information with adapters. In Proceedings of the Sixth Conference on Machine Translation, pages 578–598, Online, Nov. 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.wmt-1.64

en

dc.relation.hasversion

A. C. Stickland and I. Murray. Bert and pals: Projected attention layers for efficient adaptation in multi-task learning. In International Conference on Machine Learning, pages 5986–5995, 2019. URL http://proceedings.mlr.press/v97/stickland19a.html

en

dc.relation.hasversion

A. C. Stickland, A. B´erard, and V. Nikoulina. Multilingual domain adaptation for NMT: decoupling language and domain information with adapters. Sixth Conference on Machine Translation (WMT2021), 2021a. URL https://www.statmt.org/wmt21/pdf/2021.wmt-1.64.pdf

en

dc.relation.hasversion

A. C. Stickland, X. Li, and M. Ghazvininejad. Recipes for adapting pre-trained monolingual and multilingual models to machine translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3440–3453, Online, Apr. 2021b. Association for Computational Linguistics. URL https: //www.aclweb.org/anthology/2021.eacl-main.301

en

dc.relation.hasversion

A. ¨Ust¨un and A. Cooper Stickland. When does parameter-efficient transfer learning work for machine translation? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7919–7933, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022. emnlp-main.540

en

dc.subject

Machine learning

en

dc.subject

Natural Language Processing

en

dc.title

Parameter-efficient transfer learning for pre-trained transformers

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Cooper SticklandA_2024.pdf
Size:: 5.73 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection