Edinburgh Research Archive

Universal rewriting via machine translation

Item Status

Embargo End Date

Authors

Mallinson, Jonathan

Abstract

Natural language allows for the same meaning (semantics) to be expressed in multiple different ways, i.e. paraphrasing. This thesis examines automatic approaches for paraphrasing, focusing on three paraphrasing subtasks: unconstrained paraphrasing where there are no constraints on the output, simplification, where the output must be simpler than the input, and text compression where the output must be shorter than the input. Whilst we can learn paraphrasing from supervised data, this data is sparse and expensive to create. This thesis is concerned with the use of transfer learning to improve paraphrasing when there is no supervised data. In particular, we address the following question: can transfer learning be used to overcome a lack of paraphrasing data? To answer this question we split it into three subquestions (1) No supervised data exists for a specific paraphrasing task; can bilingual data be used as a source of training data for paraphrasing? (2) Supervised paraphrasing data exists in one language but not in another; can bilingual data be used to transfer paraphrasing training data from one language to another? (3) Can the output of encoder-decoder paraphrasing models be controlled?

This item appears in the following Collection(s)