Universal rewriting via machine translation

Mallinson, Jonathan

Universal rewriting via machine translation

Files

MallinsonJS_2021.pdf (918.48 KB)

Date

2021-11-30

Authors

Mallinson, Jonathan

Full item page

Abstract

Natural language allows for the same meaning (semantics) to be expressed in multiple different ways, i.e. paraphrasing. This thesis examines automatic approaches for paraphrasing, focusing on three paraphrasing subtasks: unconstrained paraphrasing where there are no constraints on the output, simplification, where the output must be simpler than the input, and text compression where the output must be shorter than the input. Whilst we can learn paraphrasing from supervised data, this data is sparse and expensive to create. This thesis is concerned with the use of transfer learning to improve paraphrasing when there is no supervised data. In particular, we address the following question: can transfer learning be used to overcome a lack of paraphrasing data? To answer this question we split it into three subquestions (1) No supervised data exists for a specific paraphrasing task; can bilingual data be used as a source of training data for paraphrasing? (2) Supervised paraphrasing data exists in one language but not in another; can bilingual data be used to transfer paraphrasing training data from one language to another? (3) Can the output of encoder-decoder paraphrasing models be controlled?

URI

https://hdl.handle.net/1842/38479
http://dx.doi.org/10.7488/era/1743

This item appears in the following Collection(s)

Informatics thesis and dissertation collection