Composition in distributional models of semantics
Item Status
Embargo End Date
Date
Authors
Abstract
Distributional models of semantics have proven themselves invaluable both in cognitive
modelling of semantic phenomena and also in practical applications. For example,
they have been used to model judgments of semantic similarity (McDonald,
2000) and association (Denhire and Lemaire, 2004; Griffiths et al., 2007) and have
been shown to achieve human level performance on synonymy tests (Landuaer and
Dumais, 1997; Griffiths et al., 2007) such as those included in the Test of English as
Foreign Language (TOEFL). This ability has been put to practical use in automatic thesaurus
extraction (Grefenstette, 1994). However, while there has been a considerable
amount of research directed at the most effective ways of constructing representations
for individual words, the representation of larger constructions, e.g., phrases and sentences,
has received relatively little attention. In this thesis we examine this issue of
how to compose meanings within distributional models of semantics to form representations
of multi-word structures.
Natural language data typically consists of such complex structures, rather than
just individual isolated words. Thus, a model of composition, in which individual
word meanings are combined into phrases and phrases combine to form sentences,
is of central importance in modelling this data. Commonly, however, distributional
representations are combined in terms of addition (Landuaer and Dumais, 1997; Foltz
et al., 1998), without any empirical evaluation of alternative choices. Constructing
effective distributional representations of phrases and sentences requires that we have
both a theoretical foundation to direct the development of models of composition and
also a means of empirically evaluating those models.
The approach we take is to first consider the general properties of semantic composition
and from that basis define a comprehensive framework in which to consider
the composition of distributional representations. The framework subsumes existing
proposals, such as addition and tensor products, but also allows us to define novel
composition functions. We then show that the effectiveness of these models can be evaluated on three empirical tasks.
The first of these tasks involves modelling similarity judgements for short phrases
gathered in human experiments. Distributional representations of individual words are
commonly evaluated on tasks based on their ability to model semantic similarity relations,
e.g., synonymy or priming. Thus, it seems appropriate to evaluate phrase representations
in a similar manner. We then apply compositional models to language modelling,
demonstrating that the issue of composition has practical consequences, and
also providing an evaluation based on large amounts of natural data. In our third task,
we use these language models in an analysis of reading times from an eye-movement
study. This allows us to investigate the relationship between the composition of distributional
representations and the processes involved in comprehending phrases and
sentences.
We find that these tasks do indeed allow us to evaluate and differentiate the proposed
composition functions and that the results show a reasonable consistency across
tasks. In particular, a simple multiplicative model is best for a semantic space based
on word co-occurrence, whereas an additive model is better for the topic based model
we consider. More generally, employing compositional models to construct representations
of multi-word structures typically yields improvements in performance over
non-compositonal models, which only represent individual words.
This item appears in the following Collection(s)

