Edinburgh Research Archive

Modelling cross-lingual transfer for semantic parsing

dc.contributor.advisor
Lapata, Mirella
dc.contributor.advisor
Steedman, Mark
dc.contributor.author
Sherborne, Thomas Rishi
dc.date.accessioned
2024-09-18T13:10:32Z
dc.date.available
2024-09-18T13:10:32Z
dc.date.issued
2024-09-18
dc.description.abstract
Semantic parsing maps natural language utterances to logical form representations of meaning (e.g., lambda calculus or SQL). A semantic parser functions as a human-computer interface by translating natural language into machine-readable logic to answer questions or respond to requests. Semantic parsing is a critical technology within language understanding systems (e.g., digital assistants) for accessing computational tools using natural language without expert knowledge or programming skills. Cross-lingual semantic parsing adapts a parser to map more natural languages to logical form. Contemporary advances in semantic parsing generally only study parsing of English. Successful cross-lingual transfer for a semantic parser improves the utility of parsing technologies by enabling broader access to these tools. However, developing a cross-lingual semantic parser introduces additional challenges and trade-offs. High-quality data for new languages is scarce and requires complex annotation. Given available data, a parser must adapt to language variations in expressing meaning and intent. Existing multilingual models and corpora also exhibit extant biases for English, with variable cross-lingual transfer to languages with fewer speakers or resources. At present, there is no optimal strategy or modelling solution for teaching a new language to a semantic parser. This thesis considers the efficient adaptation of a semantic parser from English to new languages. We are motivated by a case study of an engineer expanding a natural language database interface to new customers, seeking accurate parsing of new languages under a constrained budget for annotation. Overcoming the development challenges of cross-lingual semantic parsing requires innovation in model design, optimisation algorithms and strategies for sourcing and sampling data. Our overarching hypothesis is that cross-lingual transfer is achievable through aligning representations between a high-resource language (i.e., English) and new languages unseen for the task. We propose different strategies for this alignment, exploiting existing resources such as machine translation, pre-trained models, data for adjacent tasks, or a few annotated examples in each new language. We propose different modelling solutions suited to the quantity and quality of cross-lingual data. First, we propose an ensembled model to bootstrap a parser from multiple machine-translation sources, improving robustness by exploiting lower-quality synthetic data. Second, we propose a zero-shot parser using auxiliary tasks to learn cross-lingual representation alignment without any training data in new languages. Third, we propose an efficient meta-learning algorithm optimising cross-lingual transfer during training with a few labelled examples in new languages. Finally, we propose a latent variable model explicitly minimising divergence between representations across languages using Optimal Transport. Our results reveal that accurate cross-lingual semantic parsing is possible by composing minimal samples of target language data within models explicitly optimising for accurate parsing and cross-lingual transfer.
en
dc.identifier.uri
https://hdl.handle.net/1842/42188
dc.identifier.uri
http://dx.doi.org/10.7488/era/4909
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Moghe, N., Sherborne, T., Steedman, M., and Birch, A. (2023b). Extrinsic evaluation of machine translation metrics. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13060–13078, Toronto, Canada. Association for Computational Linguistics
en
dc.relation.hasversion
Sherborne, T. and Lapata, M. (2022). Zero-shot cross-lingual semantic parsing. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4134–4153, Dublin, Ireland. Association for Computational Linguistics
en
dc.relation.hasversion
Sherborne, T. and Lapata, M. (2023). Meta-learning a cross-lingual manifold for semantic parsing. Transactions of the Association for Computational Linguistics, 11:49–67
en
dc.relation.hasversion
Sherborne, T., Xu, Y., and Lapata, M. (2020). Bootstrapping a crosslingual semantic parser. In Cohn, T., He, Y., and Liu, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 499–517, Online. Association for Computational Linguistics
en
dc.subject
cross-lingual transfer for semantic parsing
en
dc.subject
Semantic parsing
en
dc.subject
Cross-lingual semantic parsing
en
dc.subject
semantic parser from English to new languages
en
dc.subject
cross-lingual transfer
en
dc.subject
Optimal Transport
en
dc.title
Modelling cross-lingual transfer for semantic parsing
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
SherborneTR_2024.pdf
Size:
6.52 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)