Generation of anaphors in Chinese
Item Status
Embargo End Date
Date
Authors
Abstract
The goal of this thesis is to investigate the computer generation of various kinds of anaphors in Chinese, including zero, pronominal and nominal anaphors, from the se¬ mantic representation of multisentential text. The work is divided into two steps: the first is to investigate linguistic behaviour of Chinese anaphora, and the other is to implement the result of the first part in a Chinese natural language generation system to see how it works.
The first step is in general to construct a set of rules governing the use of all kinds of anaphors. To achieve this, we performed a sequence of experiments in a stepwise refined manner. In the experiments, we examined the occurrence of anaphors in humangenerated text and those generated by algorithms employing the rules, assuming the same semantic and discourse structures as the text. We started by distinguishing between the use of zero and other anaphors, termed non-zeroes. Then we performed experiments to distinguish between pronouns and nominal anaphors within the nonzeroes. Finally, we refined the previous result to consider different kinds of descriptions for nominal anaphors. In this research we confine ourselves to descriptive texts. Three sets of test data consisting of scientific questions and answers and an introduction to Chinese grammar were selected. The rules we obtained from the experiments make use of the following conditions: locality between anaphor and antecedent, syntactic constraints on zero anaphors, discourse segment structures, salience of objects and animacy of objects. The results show that the anaphors generated by using the rules we obtained are very close to those in the real texts.
To carry out the second step, we built up a Chinese natural language generation system which is able to generate descriptive texts. The system is divided into a strategic and a tactical component. The strategic component arranges message contents in response to the input goal into a well-organised hierarchical discourse structure by using a text planner. The tactical component takes the hierarchical discourse structure as input and produces surface sentences with punctuation marks inserted appropriately. Within the tactical component, the first task consists of linearising in depth-first order the message units in the discourse structure and mapping them into syntactic-oriented representations. Referring expressions, the main concern in this thesis, are generated within the mapping process. A linguistic realisation program is then invoked to convert the syntactic representation into surface strings in Chinese.
After the implementation, we sent some generated texts to a number of native speakers of Chinese and compared human-created results and computer-generated text to investigate the quality of the generated anaphors. The results of the comparison show that the rules we obtained are effective in dealing with the generation of anaphors in Chinese.
This item appears in the following Collection(s)

