Edinburgh Research Archive

Enhancing structural inductive biases of sequence-to-sequence models for semantic parsing and beyond

Item Status

Embargo End Date

Authors

Lindemann, Matthias Moritz

Abstract

In recent years, sequence-to-sequence models such as Transformers have been very successfully applied to an incredibly wide range of problems in Natural Language Processing, ranging from low-level tasks such as grapheme-to-phoneme conversion to more high-level tasks such as semantic parsing and machine translation. Sequence-to-sequence models that are commonly applied to such tasks have relatively weak inductive biases, i.e. they have little prior knowledge about the nature of the task they are applied to and learn virtually everything from data. While this makes them extremely versatile, it also makes them brittle when the training data provides only a weak signal. In particular, this is the case when (i) there is only a small amount of training data or (ii) if the model is applied outside of the training distribution. Sequence-to-sequence models with weak inductive biases struggle with structural generalization, e.g. generalization to unseen combinations of syntactic structures and deeper recursion than seen during training. While scaling pretraining to ever larger datasets helps, as of yet, scaling alone does not seem to close the gap completely. The goal of this thesis is to design, implement and evaluate methods for introducing inductive biases into sequence-to-sequence models to enable structural generalization. This thesis consists of two parts. The first part develops two sequence-to-sequence models whose inductive biases for structure derive from their architecture. The idea is to explicitly model correspondences between fragments of the input and fragments of the output. This is achieved by a two-stage process: First, for each input token, the tokens it ‘contributes’ to the output are predicted without committing to their final order. Second, the output tokens are rearranged into the right order. These methods improve structural generalization in the context of semantic parsing and also perform well for other syntax-sensitive sequence-to-sequence tasks. In the second part, we present a general framework that injects structural inductive biases into a standard sequence-to-sequence architecture, the Transformer, by means of a particular pretraining and fine-tuning procedure with synthetic data. The approach is based on the observation that it is often possible to operationalize an inductive bias as a family of symbolically defined functions, and it tends to be cheap to generate instances of this family automatically. The pretraining objective is to match the behavior of the entire family of functions. This framework is applied in two settings. First, taking Finite State Transducers (FSTs) as the family of functions, we pretrain a model to match the behavior of FSTs and thereby inject an FST-like inductive bias into a Transformer. Analysis of the hidden representation reveals that this procedure makes the model simulate transitions between FST states in its hidden representations without being explicitly trained to do so. Second, for semantic parsing, we operationalize the inductive bias of interest as transformations of syntax trees. We further pretrain a model to perform these transformations, which enhances structural generalization for semantic parsing and learning from small amounts of labeled data for syntactic tasks.

This item appears in the following Collection(s)