Edinburgh Research Archive

Word length and the principle of least effort: language as an evolving, efficient code for information transfer

dc.contributor.author
Kanwal, Jasmeen Kaur
en
dc.date.accessioned
2018-09-20T09:04:35Z
dc.date.available
2018-09-20T09:04:35Z
dc.date.issued
2018-07-02
dc.description.abstract
In 1935 the linguist George Kingsley Zipf made a now classic observation about the relationship between a word’s length and its frequency: the more frequent a word is, the shorter it tends to be. He claimed that this “Law of Abbreviation” is a universal structural property of language. The Law of Abbreviation has since been documented in a wide range of human languages, and extended to animal communication systems and even computer programming languages. Zipf hypothesised that this universal design feature arises as a result of individuals optimising form-meaning mappings under competing pressures to communicate accurately but also efficiently—his famous Principle of Least Effort. In this thesis, I present a novel set of studies which provide direct experimental evidence for this explanatory hypothesis. Using a miniature artificial language learning paradigm, I show in Chapter 2 that language users optimise form-meaning mappings in line with the Law of Abbreviation only when pressures for accuracy and efficiency both operate during a communicative task. These results are robust across different methods of data collection: one version of the experiment was run in the lab, and another was run online, using a novel method I developed which allows participants to partake in dyadic interaction through a web-based interface. In Chapter 3, I address the growing body of work suggesting that a word’s predictability in context may be an even stronger determiner of its length than its frequency alone. For instance, Piantadosi et al. (2011) show that shorter words have a lower average surprisal (i.e., tend to appear in more predictive contexts) than longer words, in synchronic corpora across many languages. We hypothesise that the same communicative pressures posited by the Principle of Least Effort, when acting on speakers in situations where context manipulates the information content of words, can give rise to these lexical distributions. Adapting the methodology developed in Chapter 2, I show that participants use shorter words in more predictive contexts only when subject to the competing pressures for accurate and efficient communication. In a second experiment, I show that participants are more likely to use shorter words for meanings with a lower average surprisal. These results suggest that communicative pressures acting on individuals during language use can lead to the re-mapping of a lexicon to align with “Uniform Information Density”, the principle that information content ought to be evenly spread across an utterance, such that shorter linguistic units carry less information than longer ones. Over generations, linguistic behaviour such as that observed in the experiments reported here may bring entire lexicons into alignment with the Law of Abbreviation and Uniform Information Density. For this to happen, a diachronic process which leads to permanent lexical change is necessary. However, crucial evidence for this process—decreasing word length as a result of increasing frequency over time—has never before been systematically documented in natural language. In Chapter 4, I conduct the first large-scale diachronic corpus study investigating the relationship between word length and frequency over time, using the Google Books Ngrams corpus and three different word lists covering both English and French. Focusing on words which have both long and short variants (e.g., info/information), I show that the frequency of a word lemma may influence the rate at which the shorter variant gains in popularity. This suggests that the lexicon as a whole may indeed be gradually evolving towards greater efficiency. Taken together, the behavioural and corpus-based evidence presented in this thesis supports the hypothesis that communicative pressures acting on language-users are at least partially responsible for the frequency-length and surprisal-length relationships found universally across lexicons. More generally, the approach taken in this thesis promotes a view of language as, among other things, an evolving, efficient code for information transfer.
en
dc.identifier.uri
http://hdl.handle.net/1842/33051
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Kanwal, J., Smith, K., Culbertson, J., and Kirby, S. (2017a). Language-users choose short words in predictive contexts in an artificial language task. In Proceedings of the 39th annual conference of the Cognitive Science Society, pages 643–648.
en
dc.relation.hasversion
Kanwal, J., Smith, K., Culbertson, J., and Kirby, S. (2017b). Zipf’s Law of Abbreviation and the Principle of Least Effort: Language users optimise a miniature lexicon for efficient communication. Cognition, 165:45–52.
en
dc.subject
language evolution
en
dc.subject
information theory
en
dc.subject
communicative efficiency
en
dc.subject
artificial language learning
en
dc.subject
communication games
en
dc.subject
word length
en
dc.title
Word length and the principle of least effort: language as an evolving, efficient code for information transfer
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Kanwal2018.pdf
Size:
2.92 MB
Format:
Adobe Portable Document Format

This item appears in the following Collection(s)