Edinburgh Research Archive

On understanding character-level models for representing morphology

dc.contributor.advisor
Lopez, Adam
en
dc.contributor.advisor
Goldwater, Sharon
en
dc.contributor.author
Vania, Clara
en
dc.contributor.sponsor
other
en
dc.date.accessioned
2020-02-05T12:28:46Z
dc.date.available
2020-02-05T12:28:46Z
dc.date.issued
2020-01-20
dc.description.abstract
Morphology is the study of how words are composed of smaller units of meaning (morphemes). It allows humans to create, memorize, and understand words in their language. To process and understand human languages, we expect our computational models to also learn morphology. Recent advances in neural network models provide us with models that compose word representations from smaller units like word segments, character n-grams, or characters. These so-called subword unit models do not explicitly model morphology yet they achieve impressive performance across many multilingual NLP tasks, especially on languages with complex morphological processes. This thesis aims to shed light on the following questions: (1) What do subword unit models learn about morphology? (2) Do we still need prior knowledge about morphology? (3) How do subword unit models interact with morphological typology? First, we systematically compare various subword unit models and study their performance across language typologies. We show that models based on characters are particularly effective because they learn orthographic regularities which are consistent with morphology. To understand which aspects of morphology are not captured by these models, we compare them with an oracle with access to explicit morphological analysis. We show that in the case of dependency parsing, character-level models are still poor in representing words with ambiguous analyses. We then demonstrate how explicit modeling of morphology is helpful in such cases. Finally, we study how character-level models perform in low resource, cross-lingual NLP scenarios, whether they can facilitate cross-linguistic transfer of morphology across related languages. While we show that cross-lingual character-level models can improve low-resource NLP performance, our analysis suggests that it is mostly because of the structural similarities between languages and we do not yet find any strong evidence of crosslinguistic transfer of morphology. This thesis presents a careful, in-depth study and analyses of character-level models and their relation to morphology, providing insights and future research directions on building morphologically-aware computational NLP models.
en
dc.identifier.uri
https://hdl.handle.net/1842/36742
dc.identifier.uri
http://dx.doi.org/10.7488/era/49
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Sahin, G. G., Vania, C., Kuznetsov, I., and Gurevych, I. (2019). LINSPECTOR: multilingual probing tasks for word representations. CoRR, abs/1903.09442.
en
dc.relation.hasversion
Vania, C., Grivas, A., and Lopez, A. (2018). What do character-level models learn about morphology? The case of dependency parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2573– 2583. Association for Computational Linguistics.
en
dc.relation.hasversion
Vania, C. and Lopez, A. (2017). From Characters to Words to in Between: Do We Capture Morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2016–2027. Association for Computational Linguistics.
en
dc.relation.hasversion
Vania, C. and Lopez, A. (2018). Explicitly modeling case improves neural dependency parsing. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 356–358, Brussels, Belgium. Association for Computational Linguistics.
en
dc.subject
natural language processing
en
dc.subject
morphology
en
dc.subject
morphemes
en
dc.subject
dependency parsing
en
dc.subject
character-level models
en
dc.subject
NLP
en
dc.title
On understanding character-level models for representing morphology
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Vania2020.pdf
Size:
1.97 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)