Edinburgh Research Archive

Quantifying the perceptual value of lexical and non-lexical channels of spoken dialogue

Item Status

Embargo End Date

Authors

Wallbridge, Sarenne

Abstract

Spoken conversation is one of the most widely-used means of communication that exists. Still, the cognitive mechanisms we employ to decode information from the speech signal are not well understood. Predictive theories of language processing, which explain language comprehension as a function of the alignment between our predictions about the upcoming signal and its actual realization, have been widely adopted across the fields of psycholinguistics, cognitive science, and linguistics. However, empirical support for such theories stems primarily from written language comprehension. This thesis is a study of how predictive processing may be involved in the comprehension of spoken dialogue. We focus on two features that distinguish spoken interaction from written sentences and how these features may alter comprehension behaviours. First, speech is a much richer signal than text; information can be communicated through both the lexical channel of which words are said, and the non-lexical channel of how those words are spoken. We examine how these channels contribute, both independently and jointly, to predictions about upcoming turns in dialogue. Second, making predictions in spoken dialogue involves reasoning about abstract aspects of language such as pragmatics that may not be evident in written sentences. As such, we shift focus away from the content of predictions to instead investigate their genesis—specifically, how do the lexical and non-lexical channels constrain the shape of expectations regarding the upcoming signal? To answer this question, we propose Perceptual Information Value which quantifies the value of a channel in terms of how it alters expectations, as well as a behavioural paradigm to measure it in spoken dialogue. We begin by investigating predictions of the lexical content of dialogue. Our results show that humans generate expectations at the level of dialogue turns and that their predictions contain inherent variability. We argue that this variability is important component of how people process more realistic forms of language-use, such as spoken dialogue. Leveraging large language models as purely predictive processing mechanisms, we demonstrate a degree of alignment between human and model predictions; however, it is highly sensitive to the model architecture and training objective. In particular, model predictions do not contain the variability that we observe in human predictions. Next, we use this foundation to study how predictions manifest in spoken dialogue where messages can be distributed across both lexical and non-lexical channels. Our experiments show that expectations about spoken dialogue turns are a function of both lexical and non-lexical channels, as well as their joint expression. Importantly, we find that access to channels can constrain expectations meaningfully, even if it yields less accurate predictions. We therefore argue that the perceptual value of information in a channel lies not in its effect on predictive accuracy but more generally in its capacity to shape expectations.

This item appears in the following Collection(s)