Memorisation meets compositionality in natural language processing
dc.contributor.advisor
Titov, Ivan
dc.contributor.advisor
Lucas, Christopher
dc.contributor.author
Dankers, Verna
dc.date.accessioned
2026-01-21T14:29:32Z
dc.date.issued
2025-12-02
dc.description.abstract
In deep learning, the perspective on memorisation of training examples is undergoing a paradigm shift. Previously linked to overfitting and poor generalisation, memorisation is now seen both as beneficial when it enhances deep neural networks' generalisation capabilities and as concerning when it comes to specific examples that should not be memorised. This shift raises questions about when memorisation is beneficial, what models memorise and should memorise, and how memorisation is implemented internally. Although these questions might be relevant for deep learning problems in general, I consider them to be particularly relevant for language learning and the field of natural language processing (NLP). After all, language itself requires both syntax-driven, generalisable meaning compositions and memorisation capabilities, thanks to its dichotomous nature of being both compositional -- in terms of freely generated language -- and non-compositional -- due to the pervasiveness of fixed formulaic sequences.
This dissertation is divided into two parts, each studying memorisation in transformer models from a different angle. Within each part, I focus on the data first and then elaborate on model-internal mechanisms for memorisation.
The first part examines memorisation broadly, identifying which examples require more memorisation, whether memorisation aids generalisation and where memorisation occurs in multi-layered models. Firstly, using the task of translation, various source-target language pairs and graded memorisation metrics, examples are placed on a `memorisation map' to explore features predictive of high memorisation and their impact on model performance. Secondly, using classification tasks, memorisation localisation is examined at the level of the layers.
In the second part, I approach memorisation through the lens of natural language's compositionality, focusing on idioms as prime examples of non-compositional phrases requiring memorisation in neural networks. Using translation tasks, I analyse how models acquire idiom translations over the course of training while also monitoring models' compositional abilities.
I then examine pretrained translation models for various source-target language pairs, separating idiom translations into paraphrases and word-for-word translations, and analysing the role of transformer's attention and changes to the hidden states in translating idioms non-compositionally.
By combining insights from data analysis and internal mechanisms, this dissertation examines the link between memorisation and generalisation. I firstly show that memorisation is not a mysterious phenomenon, but is predictable based on examples' features. Secondly, I establish that model-internal mechanisms for memorisation emerge in a dispersed manner: memorisation is implemented over a range of layers, and generalisation and memorisation capabilities are intertwined. Finally, I demonstrate that memorising certain training examples can aid generalisation, but also that models still face challenges with both compositional generalisation and non-compositional memorisation.
dc.identifier.uri
https://era.ed.ac.uk/handle/1842/44330
dc.identifier.uri
https://doi.org/10.7488/era/6850
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Verna Dankers, Ivan Titov, and Dieuwke Hupkes. 2023. Memorisation cartography: Mapping out the memorisation-generalisation continuum in neural machine translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8323–8343
en
dc.relation.hasversion
Verna Dankers and Ivan Titov. 2024. Generalisation first, memorisation second? Memorisation localisation for natural language classification tasks. In Findings of the Association for Computational Linguistics ACL 2024, pages 14348–14366
en
dc.relation.hasversion
Verna Dankers, Elia Bruni, and Dieuwke Hupkes. 2022. The paradox of the compositionality of natural language: A neural machine translation case study. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4154–4175
en
dc.relation.hasversion
Verna Dankers, Christopher Lucas, and Ivan Titov. 2022. Can transformer be too compositional? Analysing idiom processing in neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3608–3626
en
dc.relation.hasversion
Verna Dankers , Anna Langedijk , Kate McCurdy, Adina Williams, and Dieuwke Hupkes. 2021. Generalising to German plural noun classes, from the perspective of a recurrent neural network. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 94–108
en
dc.relation.hasversion
Verna Dankers and Ivan Titov. 2022. Recursive neural networks with bottlenecks diagnose (non-)compositionality. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4361–4378
en
dc.relation.hasversion
Verna Dankers and Christopher Lucas. 2023. Non-compositionality in sentiment: New data and analyses. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5150–5162
en
dc.relation.hasversion
Verna Dankers and Vikas Raunak. 2025. Memorization inheritance in sequence-level knowledge distillation for neural machine translation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 760–774
en
dc.relation.hasversion
Dieuwke Hupkes, Verna Dankers, Mathijs Mul, and Elia Bruni. 2020. Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intellgence Research, 67:757–795
en
dc.relation.hasversion
Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, et al. 2023. A taxonomy and review of generalization research in NLP. Nature Machine Intelligence, 5(10):1161–1174.
en
dc.relation.hasversion
Kris Korrel, Dieuwke Hupkes, Verna Dankers, and Elia Bruni. 2019. Transcoding compositionally: Using attention to find more generalizable solutions. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 1–11
en
dc.subject
compositionality
en
dc.subject
non-compositional
en
dc.subject
model memories
en
dc.subject
memories of idioms
en
dc.subject
training materials
en
dc.subject
large language models
en
dc.subject
idioms
en
dc.subject
word for word translation
en
dc.subject
natural language processing
en
dc.title
Memorisation meets compositionality in natural language processing
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Dankers2025.pdf
- Size:
- 8.63 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

