Edinburgh Research Archive

Learning weakly structured representations for text-to-text generation

Item Status

Embargo End Date

Authors

Hosking, Tom

Abstract

Text-to-text generation refers to a class of problems that involve transforming one piece of text to another, such as paraphrase generation, summarisation and automatic translation. Deep learning approaches to text-to-text generation first map a natural language utterance to some learned representation, perform some processing within this representation space, then map the modified representation back to natural language. Currently, the majority of such models use an unstructured sequence of dense vector embeddings that is fully learned from data as the representation. This data-driven approach has proven successful and requires little guidance from a model designer, but the resulting representations are not easily interpretable and do not exploit known properties of the task under consideration (e.g., for paraphrase generation, the meaning and form of an input sentence should be treated separately). In this thesis, we hypothesise that choosing a weakly structured representation is a better approach. The structure should encode the aspects of the tasks that are known, but remain sufficiently flexible that the unknown aspects may be learned. We argue that discrete and hierarchical representations make some aspects of text-to-text generation tasks more feasible, enabling models that are attributable and scale to longer inputs. Finally, we hypothesise that structure alone is not sufficent, and that some degree of supervision is needed to assign meaning to a structured representation. We focus on two text-to-text generation tasks to gather support for these hypotheses: paraphrase generation, where a model must generate an output sentence with the same meaning but different surface form to a given input sentence; and opinion summarisation, which involves generating a textual summary that aggregates popular opinions from customer reviews about hotels or other products. We begin by proposing a model for paraphrase generation that represents the meaning and surface form of an input separately, with the surface form represented as a set of discrete codes learned through Vector Quantisation (VQ-VAE). We show that this weakly structured choice of representation enables us to generate high quality paraphrases by keeping the semantic representation constant and varying the syntactic representation, supporting our first hypothesis. We use a denoising objective based on distant supervision to induce the separation between representations. Next, we address the lack of a tractable factorisation in VQ-VAE, and introduce Hierarchical Residual Quantisation (HRQ-VAE), a method for learning hierarchical discrete representations of input data, and show that it learns more informative representations than VQ-VAE. We then combine the hierarchical representations of HRQ-VAE with separated encoding spaces for paraphrase generation, showing that the more richly structured choice of representation leads to improved quality of generated paraphrases. To demonstrate that HRQ-VAE can be beneficial for more complex text-to-text tasks, we apply it to opinion summarisation, representing sentences from customer reviews as paths through a learned hierarchy. We show that we can generate informative summaries of these reviews that are attributable and scale to large numbers of reviews, by identifying which paths in the hierarchy are frequently attested across each set of reviews. Finally, we combine the scalability and attributability of hierarchical representations with the fluency and coherence of Large Language Models, and use an encoder based on HRQ-VAE to build a hierarchical index over review sentences that may then be used to retrieve clusters of sentences containing popular opinions. We use distant supervision based on entailment relations to induce a semantic ordering to the learned hierarchy and show that the hierarchy directly enables the scalability and attributability of our model. Overall, our experiments act as support in favour of our hypotheses that weakly structured representations are beneficial for text-to-text generation, that discrete and hierarchical representations are a powerful choice of structure, and that distant supervision is needed to assign meaning to the structures.

This item appears in the following Collection(s)