Enhancing structural inductive biases of sequence-to-sequence models for semantic parsing and beyond

Lindemann, Matthias Moritz

Enhancing structural inductive biases of sequence-to-sequence models for semantic parsing and beyond

Simple item page

dc.contributor.advisor

Titov, Ivan

dc.contributor.advisor

Koller, Alexander

dc.contributor.author

Lindemann, Matthias Moritz

dc.date.accessioned

2025-07-17T10:51:48Z

dc.date.available

2025-07-17T10:51:48Z

dc.date.issued

2025-07-17

dc.description.abstract

In recent years, sequence-to-sequence models such as Transformers have been very successfully applied to an incredibly wide range of problems in Natural Language Processing, ranging from low-level tasks such as grapheme-to-phoneme conversion to more high-level tasks such as semantic parsing and machine translation. Sequence-to-sequence models that are commonly applied to such tasks have relatively weak inductive biases, i.e. they have little prior knowledge about the nature of the task they are applied to and learn virtually everything from data. While this makes them extremely versatile, it also makes them brittle when the training data provides only a weak signal. In particular, this is the case when (i) there is only a small amount of training data or (ii) if the model is applied outside of the training distribution. Sequence-to-sequence models with weak inductive biases struggle with structural generalization, e.g. generalization to unseen combinations of syntactic structures and deeper recursion than seen during training. While scaling pretraining to ever larger datasets helps, as of yet, scaling alone does not seem to close the gap completely. The goal of this thesis is to design, implement and evaluate methods for introducing inductive biases into sequence-to-sequence models to enable structural generalization. This thesis consists of two parts. The first part develops two sequence-to-sequence models whose inductive biases for structure derive from their architecture. The idea is to explicitly model correspondences between fragments of the input and fragments of the output. This is achieved by a two-stage process: First, for each input token, the tokens it ‘contributes’ to the output are predicted without committing to their final order. Second, the output tokens are rearranged into the right order. These methods improve structural generalization in the context of semantic parsing and also perform well for other syntax-sensitive sequence-to-sequence tasks. In the second part, we present a general framework that injects structural inductive biases into a standard sequence-to-sequence architecture, the Transformer, by means of a particular pretraining and fine-tuning procedure with synthetic data. The approach is based on the observation that it is often possible to operationalize an inductive bias as a family of symbolically defined functions, and it tends to be cheap to generate instances of this family automatically. The pretraining objective is to match the behavior of the entire family of functions. This framework is applied in two settings. First, taking Finite State Transducers (FSTs) as the family of functions, we pretrain a model to match the behavior of FSTs and thereby inject an FST-like inductive bias into a Transformer. Analysis of the hidden representation reveals that this procedure makes the model simulate transitions between FST states in its hidden representations without being explicitly trained to do so. Second, for semantic parsing, we operationalize the inductive bias of interest as transformations of syntax trees. We further pretrain a model to perform these transformations, which enhances structural generalization for semantic parsing and learning from small amounts of labeled data for syntactic tasks.

en

dc.identifier.uri

https://hdl.handle.net/1842/43689

dc.identifier.uri

http://dx.doi.org/10.7488/era/6221

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, and Alexander Koller. 2018. AMR dependency parsing with a typed semantic algebra. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1831–1841, Melbourne, Australia. Association for Computational Lin guistics

en

dc.relation.hasversion

Matthias Lindemann, Alexander Koller, and Ivan Titov. 2023a. Compositional generalisation with structured reordering and fertility layers. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2172–2186, Dubrovnik, Croatia. Association for Computational Linguistics.

en

dc.relation.hasversion

Matthias Lindemann, Alexander Koller, and Ivan Titov. 2023b. Compositional generalization without trees using multiset tagging and latent permutations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14488–14506, Toronto, Canada. Association for Computational Linguistics.

en

dc.relation.hasversion

Matthias Lindemann, Alexander Koller, and Ivan Titov. 2024a. SIP: Injecting a structural inductive bias into a Seq2Seq model by simulation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand. Association for Computational Linguistics

en

dc.relation.hasversion

Matthias Lindemann, Alexander Koller, and Ivan Titov. 2024b. Strengthening structural inductive biases by pre-training to perform syntactic transformations. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA. Association for Computational Linguistics.

en

dc.subject

Natural Language Processing

en

dc.subject

NLP

en

dc.subject

artificial neural networks

en

dc.subject

language structural principles

en

dc.title

Enhancing structural inductive biases of sequence-to-sequence models for semantic parsing and beyond

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lindemann2025.pdf
Size:: 1.53 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection