Edinburgh Research Archive

Enhancing structural inductive biases of sequence-to-sequence models for semantic parsing and beyond

dc.contributor.advisor
Titov, Ivan
dc.contributor.advisor
Koller, Alexander
dc.contributor.author
Lindemann, Matthias Moritz
dc.date.accessioned
2025-07-17T10:51:48Z
dc.date.available
2025-07-17T10:51:48Z
dc.date.issued
2025-07-17
dc.description.abstract
In recent years, sequence-to-sequence models such as Transformers have been very successfully applied to an incredibly wide range of problems in Natural Language Processing, ranging from low-level tasks such as grapheme-to-phoneme conversion to more high-level tasks such as semantic parsing and machine translation. Sequence-to-sequence models that are commonly applied to such tasks have relatively weak inductive biases, i.e. they have little prior knowledge about the nature of the task they are applied to and learn virtually everything from data. While this makes them extremely versatile, it also makes them brittle when the training data provides only a weak signal. In particular, this is the case when (i) there is only a small amount of training data or (ii) if the model is applied outside of the training distribution. Sequence-to-sequence models with weak inductive biases struggle with structural generalization, e.g. generalization to unseen combinations of syntactic structures and deeper recursion than seen during training. While scaling pretraining to ever larger datasets helps, as of yet, scaling alone does not seem to close the gap completely. The goal of this thesis is to design, implement and evaluate methods for introducing inductive biases into sequence-to-sequence models to enable structural generalization. This thesis consists of two parts. The first part develops two sequence-to-sequence models whose inductive biases for structure derive from their architecture. The idea is to explicitly model correspondences between fragments of the input and fragments of the output. This is achieved by a two-stage process: First, for each input token, the tokens it ‘contributes’ to the output are predicted without committing to their final order. Second, the output tokens are rearranged into the right order. These methods improve structural generalization in the context of semantic parsing and also perform well for other syntax-sensitive sequence-to-sequence tasks. In the second part, we present a general framework that injects structural inductive biases into a standard sequence-to-sequence architecture, the Transformer, by means of a particular pretraining and fine-tuning procedure with synthetic data. The approach is based on the observation that it is often possible to operationalize an inductive bias as a family of symbolically defined functions, and it tends to be cheap to generate instances of this family automatically. The pretraining objective is to match the behavior of the entire family of functions. This framework is applied in two settings. First, taking Finite State Transducers (FSTs) as the family of functions, we pretrain a model to match the behavior of FSTs and thereby inject an FST-like inductive bias into a Transformer. Analysis of the hidden representation reveals that this procedure makes the model simulate transitions between FST states in its hidden representations without being explicitly trained to do so. Second, for semantic parsing, we operationalize the inductive bias of interest as transformations of syntax trees. We further pretrain a model to perform these transformations, which enhances structural generalization for semantic parsing and learning from small amounts of labeled data for syntactic tasks.
en
dc.identifier.uri
https://hdl.handle.net/1842/43689
dc.identifier.uri
http://dx.doi.org/10.7488/era/6221
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, and Alexander Koller. 2018. AMR dependency parsing with a typed semantic algebra. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1831–1841, Melbourne, Australia. Association for Computational Lin guistics
en
dc.relation.hasversion
Matthias Lindemann, Alexander Koller, and Ivan Titov. 2023a. Compositional generalisation with structured reordering and fertility layers. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2172–2186, Dubrovnik, Croatia. Association for Computational Linguistics.
en
dc.relation.hasversion
Matthias Lindemann, Alexander Koller, and Ivan Titov. 2023b. Compositional generalization without trees using multiset tagging and latent permutations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14488–14506, Toronto, Canada. Association for Computational Linguistics.
en
dc.relation.hasversion
Matthias Lindemann, Alexander Koller, and Ivan Titov. 2024a. SIP: Injecting a structural inductive bias into a Seq2Seq model by simulation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand. Association for Computational Linguistics
en
dc.relation.hasversion
Matthias Lindemann, Alexander Koller, and Ivan Titov. 2024b. Strengthening structural inductive biases by pre-training to perform syntactic transformations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA. Association for Computational Linguistics.
en
dc.subject
Natural Language Processing
en
dc.subject
NLP
en
dc.subject
artificial neural networks
en
dc.subject
language structural principles
en
dc.title
Enhancing structural inductive biases of sequence-to-sequence models for semantic parsing and beyond
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Lindemann2025.pdf
Size:
1.53 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)