Edinburgh Research Archive

Modelling cross-lingual transfer for semantic parsing

Item Status

Embargo End Date

Authors

Sherborne, Thomas Rishi

Abstract

Semantic parsing maps natural language utterances to logical form representations of meaning (e.g., lambda calculus or SQL). A semantic parser functions as a human-computer interface by translating natural language into machine-readable logic to answer questions or respond to requests. Semantic parsing is a critical technology within language understanding systems (e.g., digital assistants) for accessing computational tools using natural language without expert knowledge or programming skills. Cross-lingual semantic parsing adapts a parser to map more natural languages to logical form. Contemporary advances in semantic parsing generally only study parsing of English. Successful cross-lingual transfer for a semantic parser improves the utility of parsing technologies by enabling broader access to these tools. However, developing a cross-lingual semantic parser introduces additional challenges and trade-offs. High-quality data for new languages is scarce and requires complex annotation. Given available data, a parser must adapt to language variations in expressing meaning and intent. Existing multilingual models and corpora also exhibit extant biases for English, with variable cross-lingual transfer to languages with fewer speakers or resources. At present, there is no optimal strategy or modelling solution for teaching a new language to a semantic parser. This thesis considers the efficient adaptation of a semantic parser from English to new languages. We are motivated by a case study of an engineer expanding a natural language database interface to new customers, seeking accurate parsing of new languages under a constrained budget for annotation. Overcoming the development challenges of cross-lingual semantic parsing requires innovation in model design, optimisation algorithms and strategies for sourcing and sampling data. Our overarching hypothesis is that cross-lingual transfer is achievable through aligning representations between a high-resource language (i.e., English) and new languages unseen for the task. We propose different strategies for this alignment, exploiting existing resources such as machine translation, pre-trained models, data for adjacent tasks, or a few annotated examples in each new language. We propose different modelling solutions suited to the quantity and quality of cross-lingual data. First, we propose an ensembled model to bootstrap a parser from multiple machine-translation sources, improving robustness by exploiting lower-quality synthetic data. Second, we propose a zero-shot parser using auxiliary tasks to learn cross-lingual representation alignment without any training data in new languages. Third, we propose an efficient meta-learning algorithm optimising cross-lingual transfer during training with a few labelled examples in new languages. Finally, we propose a latent variable model explicitly minimising divergence between representations across languages using Optimal Transport. Our results reveal that accurate cross-lingual semantic parsing is possible by composing minimal samples of target language data within models explicitly optimising for accurate parsing and cross-lingual transfer.

This item appears in the following Collection(s)