Understanding and generating language with discourse representation structures

Liu, Jiangming

Understanding and generating language with discourse representation structures

Simple item page

dc.contributor.advisor

Cohen, Shay

dc.contributor.advisor

Lapata, Maria

dc.contributor.advisor

Lascarides, Alexandr a

dc.contributor.author

Liu, Jiangming

dc.date.accessioned

2021-08-19T16:07:07Z

dc.date.available

2021-08-19T16:07:07Z

dc.date.issued

2021-07-31

dc.description.abstract

Natural Language is a way for humans to understand what is happening in the world. However, machines with intelligence prefer symbolic representations that explicitly represent linguistic information of utterances. That is why fundamental natural language processing tasks are necessary. Several symbolic formalisms have been proposed to represent the meaning of natural language. Unlike other symbolic formalisms, Discourse Representation Theory (DRT) is model-theoretic, interpretable and was proposed with the intention to capture more linguistic phenomena such as scope, quantification, and presupposition within and across sentences. Also, the recent development of resources for DRT allows developing tools based on this formalism on a larger scale compared to previous attempts, which were mostly relied on hand engineering. Thesis explores two natural language processing problems relating to Discourse Representation Structure (DRS), namely parsing and generation. We address several questions (1) how to obtain the meaning representation of natural language with arbitrary lengths based on discourse representation theory; (2) how to transform discourse representations in their box-oriented form to computational formats that are easy to model; (3) how to design neural models to automatically generate discourse representation structures from natural language and vice versa; (4) how to adopt annotations of varying quality to improve our models and move to low-resource languages analysis. We discuss discourse representation theory in the context of related meaning representations and show how DRT deals with various linguistics phenomena, such as predicate-argument structure, word senses, scope and quantification, presupposition, temporal expressions, anaphoric coreference and rhetorical relations. By comparing DRT to related meaning representations, we show why it is important to develop tools based on DRT and why it is better than other formalism. A computational format is necessary to model discourse representation structures. We provide the definition of Discourse Representation Tree Structures (DRTS) that are derived from discourse representation boxes. We propose a neural DRTS parser with a hierarchical encoder and a 3-step decoder. Furthermore, we improve upon DRTS by introducing a lossless transformation algorithm that allows us to deal with presuppositions and senses. We adopt the Transformer as our DRS parser and compare DRS parsing in tree vs clause format. In order to explore discourse representation analysis in multiple languages, we propose Universal Discourse Representation Structure (UDRS) that allows to bridge semantic symbols with the pre-trained language models and to be portable to global knowledge bases (e.g., GermaNet for German and HowNet for Chinese) instead of only English knowledge bases. It raises the problem of low-resource language analysis. In the monolingual analysis scenario, we propose an iterative learning algorithm that can adopt varying quality annotations. In the cross-lingual analysis scenario, we propose one-to-many approach which translates gold standard English to non-English text and trains multiple models (one per language) on the translations, and many-to-one approach that translates non-English text to English, and then runs a relatively accurate English model on the translated text methods. These two methods significantly improve DRS parsing in low-resource languages. We also introduce a general neural framework for DRS-to-text generation that maps DRSs to natural language. Our generator is based on an encoder-decoder architecture equipped with a novel TreeLSTM model. Based on our success with DRS parsing and generation, we empirically study neural interlingual machine translation that first parses the source language to DRSs, and then generates the target language from the DRSs. Although it cannot reach the commercial machine translation systems (e.g., Google Translate) which are trained on billions of data, our neural interlingual machine translation system outperforms the competitive baselines. Taken together, this thesis explores understanding and generating natural language with discourse representation structures by investigating semantic formalisms (i.e., discourse representation structures and universal discourse representation structures), designing neural models for the monolingual and the cross-lingual semantic parsing and natural language generation, and studying DRS-interlingual machine translation. Our experiments on Groningen Meaning Bank and Parallel Meaning Bank show the successes of neural discourse representation structure parsing and generation and shed light on natural language understanding and generation for natural language processing tasks (e.g., DRS-interlingual machine translation).

en

dc.identifier.uri

https://hdl.handle.net/1842/37936

dc.identifier.uri

http://dx.doi.org/10.7488/era/1211

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.title

Understanding and generating language with discourse representation structures

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LiuJ__2021.pdf
Size:: 2.03 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection