The goal of this thesis is to investigate the computer generation of various kinds of
anaphors in Chinese, including zero, pronominal and nominal anaphors, from the se¬
mantic representation of multisentential text. The work is divided into two steps: the
first is to investigate linguistic behaviour of Chinese anaphora, and the other is to
implement the result of the first part in a Chinese natural language generation system
to see how it works.
The first step is in general to construct a set of rules governing the use of all kinds
of anaphors. To achieve this, we performed a sequence of experiments in a stepwise
refined manner. In the experiments, we examined the occurrence of anaphors in humangenerated
text and those generated by algorithms employing the rules, assuming the
same semantic and discourse structures as the text. We started by distinguishing
between the use of zero and other anaphors, termed non-zeroes. Then we performed
experiments to distinguish between pronouns and nominal anaphors within the nonzeroes.
Finally, we refined the previous result to consider different kinds of descriptions
for nominal anaphors. In this research we confine ourselves to descriptive texts. Three
sets of test data consisting of scientific questions and answers and an introduction to
Chinese grammar were selected. The rules we obtained from the experiments make
use of the following conditions: locality between anaphor and antecedent, syntactic
constraints on zero anaphors, discourse segment structures, salience of objects and
animacy of objects. The results show that the anaphors generated by using the rules
we obtained are very close to those in the real texts.
To carry out the second step, we built up a Chinese natural language generation system
which is able to generate descriptive texts. The system is divided into a strategic and
a tactical component. The strategic component arranges message contents in response
to the input goal into a well-organised hierarchical discourse structure by using a
text planner. The tactical component takes the hierarchical discourse structure as
input and produces surface sentences with punctuation marks inserted appropriately.
Within the tactical component, the first task consists of linearising in depth-first order
the message units in the discourse structure and mapping them into syntactic-oriented
representations. Referring expressions, the main concern in this thesis, are generated
within the mapping process. A linguistic realisation program is then invoked to convert
the syntactic representation into surface strings in Chinese.
After the implementation, we sent some generated texts to a number of native speakers of Chinese and compared human-created results and computer-generated text to
investigate the quality of the generated anaphors. The results of the comparison show
that the rules we obtained are effective in dealing with the generation of anaphors in