Quantifying the perceptual value of lexical and non-lexical channels of spoken dialogue
Item Status
Embargo End Date
Date
Authors
Wallbridge, Sarenne
Abstract
Spoken conversation is one of the most widely-used means of communication that exists. Still, the cognitive mechanisms we employ to decode information from the speech
signal are not well understood. Predictive theories of language processing, which explain language comprehension as a function of the alignment between our predictions
about the upcoming signal and its actual realization, have been widely adopted across
the fields of psycholinguistics, cognitive science, and linguistics. However, empirical
support for such theories stems primarily from written language comprehension.
This thesis is a study of how predictive processing may be involved in the comprehension of spoken dialogue. We focus on two features that distinguish spoken
interaction from written sentences and how these features may alter comprehension
behaviours. First, speech is a much richer signal than text; information can be communicated through both the lexical channel of which words are said, and the non-lexical
channel of how those words are spoken. We examine how these channels contribute,
both independently and jointly, to predictions about upcoming turns in dialogue. Second, making predictions in spoken dialogue involves reasoning about abstract aspects
of language such as pragmatics that may not be evident in written sentences. As
such, we shift focus away from the content of predictions to instead investigate their
genesis—specifically, how do the lexical and non-lexical channels constrain the shape
of expectations regarding the upcoming signal? To answer this question, we propose
Perceptual Information Value which quantifies the value of a channel in terms of how it
alters expectations, as well as a behavioural paradigm to measure it in spoken dialogue.
We begin by investigating predictions of the lexical content of dialogue. Our results show that humans generate expectations at the level of dialogue turns and that
their predictions contain inherent variability. We argue that this variability is important component of how people process more realistic forms of language-use, such as
spoken dialogue. Leveraging large language models as purely predictive processing
mechanisms, we demonstrate a degree of alignment between human and model predictions; however, it is highly sensitive to the model architecture and training objective. In particular, model predictions do not contain the variability that we observe in
human predictions. Next, we use this foundation to study how predictions manifest in
spoken dialogue where messages can be distributed across both lexical and non-lexical
channels. Our experiments show that expectations about spoken dialogue turns are
a function of both lexical and non-lexical channels, as well as their joint expression.
Importantly, we find that access to channels can constrain expectations meaningfully,
even if it yields less accurate predictions. We therefore argue that the perceptual value
of information in a channel lies not in its effect on predictive accuracy but more generally in its capacity to shape expectations.
This item appears in the following Collection(s)

