Definite Description Processing in Unrestricted Text
View/ Open
Date
07/1998Author
Vieira, Renata
Metadata
Abstract
Noun phrases with the definite article the, that we call DEFINITE DESCRIPTIONS, following
(Russell, 1905), are one of the most common constructs in English, and have
been extensively studied by linguists, philosophers, psychologists, and computational
linguists.
In this dissertation we present an implemented model of definite description processing
that is based on extensive empirical studies of definite description use and whose
performance can be quantitatively measured.
In almost all approaches to discourse processing and discourse representation, definite
descriptions have been regarded as anaphoric1; and the models of definite description
processing proposed in the literature tend to emphasise the role of common-sense inference
mechanisms.
Recent work on discourse interpretation (Carletta, 1996; Carletta et al., 1997; Walker
and Moore, 1997) has claimed that the judgements on which a theory is based should
be shared by more than one subject. On the basis of previous linguistics and corpus
linguistics work, we developed several annotation schemes and ran two experiments
in which subjects were asked to annotate the uses of definite descriptions in newspaper
articles. We compared their annotations and used them to develop our system and to
evaluate its performance.
Quantitative evaluation has become an issue in other language engineering tasks such
as parsing, and has shown its usefulness also for theoretical developments. Recently,
evaluation techniques have been introduced for semantic interpretation as well, as is
the case for the Sixth Message Understanding Conference (MUC-6) (Sundheim, 1995).
However, in this case, the emphasis was on the engineering aspects rather than on a
careful study of the phenomena. Our goal has been to develop methods whose performance
could be evaluated, but that were based on a careful study of linguistic evidence.
The empirical studies we present are evidence that definite descriptions are not primarily
anaphoric; they are often used to introduce a new entity in the discourse. Therefore,
in the model of definite description processing that we propose, recognising discourse
new descriptions plays a role as important as identifying the antecedent of those used
anaphorically.
Unlike most previous models, our system does not make use of specific hand coded
knowledge or common-sense reasoning techniques; the only lexical source we use is
WordNet (Miller et al., 1993). As a consequence, our system can process definite descriptions
in any domain; a drawback is that our coverage is limited. Nevertheless,
our studies serve to reveal the kind of knowledge that is needed for resolving definite
descriptions, especially the bridging cases. The system resulting from this work can
be useful in applications such as semi-automatic coreference annotation in unrestricted
domains.