Memorisation meets compositionality in natural language processing

Dankers, Verna

Memorisation meets compositionality in natural language processing

Simple item page

dc.contributor.advisor

Titov, Ivan

dc.contributor.advisor

Lucas, Christopher

dc.contributor.author

Dankers, Verna

dc.date.accessioned

2026-01-21T14:29:32Z

dc.date.issued

2025-12-02

dc.description.abstract

In deep learning, the perspective on memorisation of training examples is undergoing a paradigm shift. Previously linked to overfitting and poor generalisation, memorisation is now seen both as beneficial when it enhances deep neural networks' generalisation capabilities and as concerning when it comes to specific examples that should not be memorised. This shift raises questions about when memorisation is beneficial, what models memorise and should memorise, and how memorisation is implemented internally. Although these questions might be relevant for deep learning problems in general, I consider them to be particularly relevant for language learning and the field of natural language processing (NLP). After all, language itself requires both syntax-driven, generalisable meaning compositions and memorisation capabilities, thanks to its dichotomous nature of being both compositional -- in terms of freely generated language -- and non-compositional -- due to the pervasiveness of fixed formulaic sequences. This dissertation is divided into two parts, each studying memorisation in transformer models from a different angle. Within each part, I focus on the data first and then elaborate on model-internal mechanisms for memorisation. The first part examines memorisation broadly, identifying which examples require more memorisation, whether memorisation aids generalisation and where memorisation occurs in multi-layered models. Firstly, using the task of translation, various source-target language pairs and graded memorisation metrics, examples are placed on a `memorisation map' to explore features predictive of high memorisation and their impact on model performance. Secondly, using classification tasks, memorisation localisation is examined at the level of the layers. In the second part, I approach memorisation through the lens of natural language's compositionality, focusing on idioms as prime examples of non-compositional phrases requiring memorisation in neural networks. Using translation tasks, I analyse how models acquire idiom translations over the course of training while also monitoring models' compositional abilities. I then examine pretrained translation models for various source-target language pairs, separating idiom translations into paraphrases and word-for-word translations, and analysing the role of transformer's attention and changes to the hidden states in translating idioms non-compositionally. By combining insights from data analysis and internal mechanisms, this dissertation examines the link between memorisation and generalisation. I firstly show that memorisation is not a mysterious phenomenon, but is predictable based on examples' features. Secondly, I establish that model-internal mechanisms for memorisation emerge in a dispersed manner: memorisation is implemented over a range of layers, and generalisation and memorisation capabilities are intertwined. Finally, I demonstrate that memorising certain training examples can aid generalisation, but also that models still face challenges with both compositional generalisation and non-compositional memorisation.

dc.identifier.uri

https://era.ed.ac.uk/handle/1842/44330

dc.identifier.uri

https://doi.org/10.7488/era/6850

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Verna Dankers, Ivan Titov, and Dieuwke Hupkes. 2023. Memorisation cartography: Mapping out the memorisation-generalisation continuum in neural machine translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8323–8343

en

dc.relation.hasversion

Verna Dankers and Ivan Titov. 2024. Generalisation first, memorisation second? Memorisation localisation for natural language classification tasks. In Findings of the Association for Computational Linguistics ACL 2024, pages 14348–14366

en

dc.relation.hasversion

Verna Dankers, Elia Bruni, and Dieuwke Hupkes. 2022. The paradox of the compositionality of natural language: A neural machine translation case study. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4154–4175

en

dc.relation.hasversion

Verna Dankers, Christopher Lucas, and Ivan Titov. 2022. Can transformer be too compositional? Analysing idiom processing in neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3608–3626

en

dc.relation.hasversion

Verna Dankers , Anna Langedijk , Kate McCurdy, Adina Williams, and Dieuwke Hupkes. 2021. Generalising to German plural noun classes, from the perspective of a recurrent neural network. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 94–108

en

dc.relation.hasversion

Verna Dankers and Ivan Titov. 2022. Recursive neural networks with bottlenecks diagnose (non-)compositionality. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4361–4378

en

dc.relation.hasversion

Verna Dankers and Christopher Lucas. 2023. Non-compositionality in sentiment: New data and analyses. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5150–5162

en

dc.relation.hasversion

Verna Dankers and Vikas Raunak. 2025. Memorization inheritance in sequence-level knowledge distillation for neural machine translation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 760–774

en

dc.relation.hasversion

Dieuwke Hupkes, Verna Dankers, Mathijs Mul, and Elia Bruni. 2020. Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intellgence Research, 67:757–795

en

dc.relation.hasversion

Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, et al. 2023. A taxonomy and review of generalization research in NLP. Nature Machine Intelligence, 5(10):1161–1174.

en

dc.relation.hasversion

Kris Korrel, Dieuwke Hupkes, Verna Dankers, and Elia Bruni. 2019. Transcoding compositionally: Using attention to find more generalizable solutions. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 1–11

en

dc.subject

compositionality

en

dc.subject

non-compositional

en

dc.subject

model memories

en

dc.subject

memories of idioms

en

dc.subject

training materials

en

dc.subject

large language models

en

dc.subject

idioms

en

dc.subject

word for word translation

en

dc.subject

natural language processing

en

dc.title

Memorisation meets compositionality in natural language processing

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Dankers2025.pdf
Size:: 8.63 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection