Modelling cross-lingual transfer for semantic parsing

Sherborne, Thomas Rishi

Modelling cross-lingual transfer for semantic parsing

Simple item page

dc.contributor.advisor

Lapata, Mirella

dc.contributor.advisor

Steedman, Mark

dc.contributor.author

Sherborne, Thomas Rishi

dc.date.accessioned

2024-09-18T13:10:32Z

dc.date.available

2024-09-18T13:10:32Z

dc.date.issued

2024-09-18

dc.description.abstract

Semantic parsing maps natural language utterances to logical form representations of meaning (e.g., lambda calculus or SQL). A semantic parser functions as a human-computer interface by translating natural language into machine-readable logic to answer questions or respond to requests. Semantic parsing is a critical technology within language understanding systems (e.g., digital assistants) for accessing computational tools using natural language without expert knowledge or programming skills. Cross-lingual semantic parsing adapts a parser to map more natural languages to logical form. Contemporary advances in semantic parsing generally only study parsing of English. Successful cross-lingual transfer for a semantic parser improves the utility of parsing technologies by enabling broader access to these tools. However, developing a cross-lingual semantic parser introduces additional challenges and trade-offs. High-quality data for new languages is scarce and requires complex annotation. Given available data, a parser must adapt to language variations in expressing meaning and intent. Existing multilingual models and corpora also exhibit extant biases for English, with variable cross-lingual transfer to languages with fewer speakers or resources. At present, there is no optimal strategy or modelling solution for teaching a new language to a semantic parser. This thesis considers the efficient adaptation of a semantic parser from English to new languages. We are motivated by a case study of an engineer expanding a natural language database interface to new customers, seeking accurate parsing of new languages under a constrained budget for annotation. Overcoming the development challenges of cross-lingual semantic parsing requires innovation in model design, optimisation algorithms and strategies for sourcing and sampling data. Our overarching hypothesis is that cross-lingual transfer is achievable through aligning representations between a high-resource language (i.e., English) and new languages unseen for the task. We propose different strategies for this alignment, exploiting existing resources such as machine translation, pre-trained models, data for adjacent tasks, or a few annotated examples in each new language. We propose different modelling solutions suited to the quantity and quality of cross-lingual data. First, we propose an ensembled model to bootstrap a parser from multiple machine-translation sources, improving robustness by exploiting lower-quality synthetic data. Second, we propose a zero-shot parser using auxiliary tasks to learn cross-lingual representation alignment without any training data in new languages. Third, we propose an efficient meta-learning algorithm optimising cross-lingual transfer during training with a few labelled examples in new languages. Finally, we propose a latent variable model explicitly minimising divergence between representations across languages using Optimal Transport. Our results reveal that accurate cross-lingual semantic parsing is possible by composing minimal samples of target language data within models explicitly optimising for accurate parsing and cross-lingual transfer.

en

dc.identifier.uri

https://hdl.handle.net/1842/42188

dc.identifier.uri

http://dx.doi.org/10.7488/era/4909

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Moghe, N., Sherborne, T., Steedman, M., and Birch, A. (2023b). Extrinsic evaluation of machine translation metrics. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13060–13078, Toronto, Canada. Association for Computational Linguistics

en

dc.relation.hasversion

Sherborne, T. and Lapata, M. (2022). Zero-shot cross-lingual semantic parsing. In Muresan, S., Nakov, P., and Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4134–4153, Dublin, Ireland. Association for Computational Linguistics

en

dc.relation.hasversion

Sherborne, T. and Lapata, M. (2023). Meta-learning a cross-lingual manifold for semantic parsing. Transactions of the Association for Computational Linguistics, 11:49–67

en

dc.relation.hasversion

Sherborne, T., Xu, Y., and Lapata, M. (2020). Bootstrapping a crosslingual semantic parser. In Cohn, T., He, Y., and Liu, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 499–517, Online. Association for Computational Linguistics

en

dc.subject

cross-lingual transfer for semantic parsing

en

dc.subject

Semantic parsing

en

dc.subject

Cross-lingual semantic parsing

en

dc.subject

semantic parser from English to new languages

en

dc.subject

cross-lingual transfer

en

dc.subject

Optimal Transport

en

dc.title

Modelling cross-lingual transfer for semantic parsing

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SherborneTR_2024.pdf
Size:: 6.52 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection