On understanding character-level models for representing morphology
dc.contributor.advisor
Lopez, Adam
en
dc.contributor.advisor
Goldwater, Sharon
en
dc.contributor.author
Vania, Clara
en
dc.contributor.sponsor
other
en
dc.date.accessioned
2020-02-05T12:28:46Z
dc.date.available
2020-02-05T12:28:46Z
dc.date.issued
2020-01-20
dc.description.abstract
Morphology is the study of how words are composed of smaller units of meaning
(morphemes). It allows humans to create, memorize, and understand words in their
language. To process and understand human languages, we expect our computational
models to also learn morphology. Recent advances in neural network models provide
us with models that compose word representations from smaller units like word segments,
character n-grams, or characters. These so-called subword unit models do not
explicitly model morphology yet they achieve impressive performance across many
multilingual NLP tasks, especially on languages with complex morphological processes.
This thesis aims to shed light on the following questions: (1) What do subword
unit models learn about morphology? (2) Do we still need prior knowledge about
morphology? (3) How do subword unit models interact with morphological typology?
First, we systematically compare various subword unit models and study their performance
across language typologies. We show that models based on characters are
particularly effective because they learn orthographic regularities which are consistent
with morphology. To understand which aspects of morphology are not captured by
these models, we compare them with an oracle with access to explicit morphological
analysis. We show that in the case of dependency parsing, character-level models
are still poor in representing words with ambiguous analyses. We then demonstrate
how explicit modeling of morphology is helpful in such cases. Finally, we study how
character-level models perform in low resource, cross-lingual NLP scenarios, whether
they can facilitate cross-linguistic transfer of morphology across related languages.
While we show that cross-lingual character-level models can improve low-resource
NLP performance, our analysis suggests that it is mostly because of the structural
similarities between languages and we do not yet find any strong evidence of crosslinguistic
transfer of morphology. This thesis presents a careful, in-depth study and
analyses of character-level models and their relation to morphology, providing insights
and future research directions on building morphologically-aware computational NLP
models.
en
dc.identifier.uri
https://hdl.handle.net/1842/36742
dc.identifier.uri
http://dx.doi.org/10.7488/era/49
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Sahin, G. G., Vania, C., Kuznetsov, I., and Gurevych, I. (2019). LINSPECTOR: multilingual probing tasks for word representations. CoRR, abs/1903.09442.
en
dc.relation.hasversion
Vania, C., Grivas, A., and Lopez, A. (2018). What do character-level models learn about morphology? The case of dependency parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2573– 2583. Association for Computational Linguistics.
en
dc.relation.hasversion
Vania, C. and Lopez, A. (2017). From Characters to Words to in Between: Do We Capture Morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2016–2027. Association for Computational Linguistics.
en
dc.relation.hasversion
Vania, C. and Lopez, A. (2018). Explicitly modeling case improves neural dependency parsing. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 356–358, Brussels, Belgium. Association for Computational Linguistics.
en
dc.subject
natural language processing
en
dc.subject
morphology
en
dc.subject
morphemes
en
dc.subject
dependency parsing
en
dc.subject
character-level models
en
dc.subject
NLP
en
dc.title
On understanding character-level models for representing morphology
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Vania2020.pdf
- Size:
- 1.97 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

