Computational approach to typological comparative concepts for lexicality
dc.contributor.advisor
Goldwater, Sharon
dc.contributor.advisor
Ponti, Edoardo
dc.contributor.author
Haley, Coleman
dc.contributor.sponsor
UKRI CDT in Natural Language Processing
dc.date.accessioned
2026-03-18T17:31:16Z
dc.date.issued
2026-03-18
dc.description.abstract
One major dimension of linguistic organization is the notion that there are lexical linguistic units, which express meanings, and functional linguistic units, which are determined by syntax and/or discourse and serve to organize and clarify the relationships between lexical elements. This dichotomy has been described at multiple levels of linguistic structure and motivates at least two classical distinctions in linguistics. At the level of words, it motivates the so-called lexical–functional distinction, while within morphology, a related distinction is drawn between derivation (which forms new lexical items) and inflection (which produces forms of lexical items). These dichotomies have many noted boundary cases, which have led to many linguists rejecting them, or treating them as gradient. In this thesis, I refer to this gradient of semantic weight at different levels of formal structure as lexicality.
There is substantial neurological and psychological evidence for the import ance of lexicality to human language processing. Further, lexicality dichotomies also emerge in cross-linguistic trends in grammatical organization, such as asymmetries between inflection and derivation, or between the properties of functional and lexical word classes. Yet the lexicality of a particular linguistic unit varies contextually and diachronically. In linguistic practices that proceed from analysis of language-particular data to a language-general analysis, issues of lexicality have played a role of central importance.
However, in the functional-typological tradition, which proceeds from cross-linguistic analysis to the language particular, the relationship of semantic contentfulness to linguistic organization has had little theoretical impact. A major factor is that typological research must be conducted with cross-linguistically applicable comparative concepts—but it is difficult to produce a principled measure of semantic contentfulness which is applicable across languages. In this thesis, I leverage deep learning models to produce empirically grounded measures for lexicality, which I argue can serve as interesting and useful comparative concepts for typological study.
In the first part of the thesis, I focus on inflection and derivation, operationalizing a four-dimensional framework for formal and deep-learning derived distributional properties of the distinction. I show that formal and distributional variability are strong correlates of this traditional distinction across a sample of 26 languages, and that the four measures can predict inflection vs. derivation with 89% accuracy, suggesting that this oft-debated distinction has a substantial empirical basis across languages, from a combination of formal and functional properties.
In the second part of the thesis, I introduce a novel groundedness measure, which aims to provide a cross-linguistic empirical ground for language function to quantify contextual semantic contentfulness. To do so, I leverage image caption datasets and vision-and-language models. This measure captures the lexical–functional distinction in word classes across 30 languages but diverges substantially from related measures like concreteness.
Interestingly, groundedness displays asymmetries not just between lexical and functional items, but also among the major lexical classes of nouns, verbs, and adjectives. I argue that this suggests a connection between ideas of lexical word-class continua cognitive linguistics and the lexical–functional distinction.
I apply groundedness to deviations from prototypical lexical class organization. I show that groundedness predicts the split between Japanese na-and
i-adjectives, which has previously been thought to have little synchronic relevance. On the other hand, an investigation of the Tensedness Universal shows the challenges with certain types of cross-linguistic comparisons of groundedness with current methods.
dc.identifier.uri
https://era.ed.ac.uk/handle/1842/44498
dc.identifier.uri
https://doi.org/10.7488/era/7015
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Haley, C., Ponti, E. M., and Goldwater, S. (2024). Corpus-based measures discriminate inflection and derivation cross-linguistically. Journal of Language Modelling, 12(2):477–529
dc.relation.hasversion
Haley, C., Goldwater, S., and Ponti, E. M. (2025). A Grounded Typology of Word Classes. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 10380–10399, Albuquerque, New Mexico. Association for Computational Linguistics.
dc.relation.hasversion
Balint Gyevnar, Gautier Dagan, Coleman Haley, Shangmin Guo, and Frank Mollica. 2022. Communicative Efficiency or Iconic Learning: Do Acquisition and Communicative Pressures Interact to Shape Colour- Naming Systems? Entropy, 24(11):1542.
dc.relation.hasversion
Coleman Haley. 2025. Unlocking finite-state morphological transducers: Derivational networks for Inuit-Yupik languages. Society for Computation in Linguistics, 8(1).
dc.relation.hasversion
Hyunji Hayley Park, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu, and Lane Schwartz. 2021. Morphology matters: A multilingual language modeling analysis. Transactions of the Association for Computational Linguistics, 9:261–276.
dc.subject
typology
dc.subject
morphology
dc.subject
parts of speech
dc.subject
computational linguistics
dc.subject
computational typology
dc.subject
cognitive linguistics
dc.subject
functional linguistics
dc.subject
multimodal NLP
dc.subject
Natural Language Processing
dc.subject
vision-and-language models
dc.title
Computational approach to typological comparative concepts for lexicality
dc.type
Thesis
dc.type.qualificationlevel
Doctoral
dc.type.qualificationname
PhD Doctor of Philosophy
Files
Original bundle
1 - 1 of 1
- Name:
- Haley2026.pdf
- Size:
- 7.17 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

