Computational approach to typological comparative concepts for lexicality

Haley, Coleman

Computational approach to typological comparative concepts for lexicality

Files

Haley2026.pdf (7.17 MB)

Date

2026-03-18

Authors

Haley, Coleman

Full item page

Abstract

One major dimension of linguistic organization is the notion that there are lexical linguistic units, which express meanings, and functional linguistic units, which are determined by syntax and/or discourse and serve to organize and clarify the relationships between lexical elements. This dichotomy has been described at multiple levels of linguistic structure and motivates at least two classical distinctions in linguistics. At the level of words, it motivates the so-called lexical–functional distinction, while within morphology, a related distinction is drawn between derivation (which forms new lexical items) and inflection (which produces forms of lexical items). These dichotomies have many noted boundary cases, which have led to many linguists rejecting them, or treating them as gradient. In this thesis, I refer to this gradient of semantic weight at different levels of formal structure as lexicality. There is substantial neurological and psychological evidence for the import ance of lexicality to human language processing. Further, lexicality dichotomies also emerge in cross-linguistic trends in grammatical organization, such as asymmetries between inflection and derivation, or between the properties of functional and lexical word classes. Yet the lexicality of a particular linguistic unit varies contextually and diachronically. In linguistic practices that proceed from analysis of language-particular data to a language-general analysis, issues of lexicality have played a role of central importance. However, in the functional-typological tradition, which proceeds from cross-linguistic analysis to the language particular, the relationship of semantic contentfulness to linguistic organization has had little theoretical impact. A major factor is that typological research must be conducted with cross-linguistically applicable comparative concepts—but it is difficult to produce a principled measure of semantic contentfulness which is applicable across languages. In this thesis, I leverage deep learning models to produce empirically grounded measures for lexicality, which I argue can serve as interesting and useful comparative concepts for typological study. In the first part of the thesis, I focus on inflection and derivation, operationalizing a four-dimensional framework for formal and deep-learning derived distributional properties of the distinction. I show that formal and distributional variability are strong correlates of this traditional distinction across a sample of 26 languages, and that the four measures can predict inflection vs. derivation with 89% accuracy, suggesting that this oft-debated distinction has a substantial empirical basis across languages, from a combination of formal and functional properties. In the second part of the thesis, I introduce a novel groundedness measure, which aims to provide a cross-linguistic empirical ground for language function to quantify contextual semantic contentfulness. To do so, I leverage image caption datasets and vision-and-language models. This measure captures the lexical–functional distinction in word classes across 30 languages but diverges substantially from related measures like concreteness. Interestingly, groundedness displays asymmetries not just between lexical and functional items, but also among the major lexical classes of nouns, verbs, and adjectives. I argue that this suggests a connection between ideas of lexical word-class continua cognitive linguistics and the lexical–functional distinction. I apply groundedness to deviations from prototypical lexical class organization. I show that groundedness predicts the split between Japanese na-and i-adjectives, which has previously been thought to have little synchronic relevance. On the other hand, an investigation of the Tensedness Universal shows the challenges with certain types of cross-linguistic comparisons of groundedness with current methods.

URI

https://era.ed.ac.uk/handle/1842/44498
https://doi.org/10.7488/era/7015

This item appears in the following Collection(s)

Informatics thesis and dissertation collection