Context restriction for low resource neural morphological analysis
dc.contributor.advisor
Goldwater, Sharon
dc.contributor.advisor
Lopez, Adam
dc.contributor.author
Makazhanov, Aibek
dc.date.accessioned
2025-02-04T09:13:49Z
dc.date.available
2025-02-04T09:13:49Z
dc.date.issued
2022-07-11
dc.description.abstract
Contextual morphological analysis is the task of finding the most probable lemma and morpho-syntactic description (i.e. part of speech and grammatical markers, such as case, tense, etc.) for a given word in a given context.
Historically, approaches to the task relied on using few words of local context due to both model design (e.g. HMM) and data sparseness concerns.
With the advent of deep learning, resorting to local context ceased to be a necessity, and modern approaches exclusively use global (sentential) context.
In this thesis we investigate whether context restriction can still be useful for neural morphological analysis.
For our first set of experiments we adapt a character-level encoder-decoder model that was previously used for related tasks of lemmatization and morphological generation.
We start by showing that using just one word of surrounding context not only yields better results, but is also more efficient than using global context.
Then, on a data set of more than hundred corpora, we show that relying on larger context windows is preferable only when training data is sufficiently large, while using a single word of context is better both in low resource scenarios and on average.
Finally, we show that augmenting our model with contextualized word embeddings does not increase its performance.
Inspired by the success of our character-level model, in our second set of experiments we try context restriction with a popular off-the-shelf word-level neural morphological analyzer.
Here too, we show, that when training data is scarce, limiting context to a few words does improve performance, especially for agglutinative and fusional languages.
However, with enough data using global context is still better.
To investigate what restricted models miss from global context, in a follow-up experiment we show, that context restriction hinders the ability of the model to correctly analyze words, whose dependency heads are left beyond the context window.
Finally, we find, that to improve performance on small data sets, one does not even have to train in a context-restricted manner. It is enough to limit context at inference time to achieve comparable performance.
en
dc.identifier.uri
https://hdl.handle.net/1842/43060
dc.identifier.uri
http://dx.doi.org/10.7488/era/5606
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Makhambetov, O., Makazhanov, A., Sabyrgaliyev, I., and Yessenbayev, Z. (2015). Data-Driven Morphological Analysis and Disambiguation for Kazakh. In Gelbukh, A., editor, Computational Linguistics and Intelligent Text Processing, pages 151– 163, Cham. Springer International Publishing
en
dc.relation.hasversion
Toleu, A., Tolegen, G., and Makazhanov, A. (2017). Character-aware neural morphological disambiguation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 666–671, Vancouver, Canada. Association for Computational Linguistics
en
dc.relation.hasversion
Tyers, F., Washington, J., C¸ ¨oltekin, C¸ ., and Makazhanov, A. (2017). An assessment of Universal Dependency annotation guidelines for Turkic languages
en
dc.subject
Contextual morphological analysis
en
dc.subject
global (sentential) context
en
dc.subject
neural morphological analysis
en
dc.subject
character-level model
en
dc.subject
neural morphological analyzer
en
dc.title
Context restriction for low resource neural morphological analysis
en
dc.title.alternative
On context restriction for low resource neural morphological analysis
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Masters
en
dc.type.qualificationname
MPhil Master of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- MakazhanovA_2022.pdf
- Size:
- 808.9 KB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

