Towards efficient universal neural machine translation
View/ Open
Date
03/08/2022Author
Zhang, Biao
Metadata
Abstract
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims to overcome such barriers by automatically transforming information from one language to another. With the rapid development of deep neural networks, neural machine translation (NMT) – especially Transformer – has achieved great success in recent years, delivering state-of-the-art and even near human performance on many bilingual text-based translation tasks. However, challenges remain particularly in 1) efficiency where a massive NMT model is a computational bottleneck for training and decoding, and 2) universality where extending NMT beyond bilingual and text-based scenarios (such as multilingual and speech-to-text translation) is still non-trivial. In this thesis, we investigate ways of developing simple and effective neural architectures to address these two challenges.
NMT is resource-hungry. Achieving high-quality translation demands complex network architectures and a large number of model parameters, which often takes hundreds or even thousands of training GPU hours and leads to slow inference. We tackle this computational inefficiency issue via three aspects: 1) simplifying model architectures, where we propose a lightweight recurrent network and root mean square layer normalization to enable higher model parallelization, as well as a merged attention network paired with depth-scaled initialization to improve deep Transformer; 2) exploring representation redundancy, where we demonstrate the feasibility of sparsifying encoder outputs in Transformer and propose a rectified linear attention to induce sparse attention weights efficiently; and 3) semi-autoregressive modeling, where we relax the independence assumption by allowing generation from the left-to-right and right-to-left directions simultaneously. Apart from benefiting efficiency, these techniques also lay the foundation for our research on universality, another topic of this thesis.
MT should be universal, i.e., transforming information between any languages in any modalities. Unfortunately, NMT still struggles with poor language coverage and cross-modality gap. As a step towards universality, we focus on (massively) multilingual NMT and direct speech-to-text translation (ST). Multilingual NMT suffers from capacity bottleneck and off-target translation; thus we study methods of increasing modeling capacity for multilingual Transformer, and propose random online backtranslation to bridge zero-short language pairs. We further explore when and where language-specific (LS) modeling matters via conditional LS routing, discovering the trade-off between shared and LS capacity. Unlike textual NMT, the modality gap between speech and text hinders ST. We narrow this gap by inventing adaptive feature selection, which automatically filters out uninformative speech features, improving translation as well as inference speed. Next, we extend our study to document-level ST to address the question whether and how context helps ST. We adopt contextual modeling for ST, and show its effectiveness on enhancing homophone and simultaneous translation.
Universality covers multilinguality and multimodality. Finally, we discuss multilingual ST, a critical path to universal NMT. We integrate the above methods into a joint model and participate in the multilingual ST shared task in IWSLT2021. Our system achieves competitive performance in both supervised and zero-shot translation, where we observe the complementarity of different techniques in improving multilingual ST.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Improving verb phrase anaphora translation in English to French statistical machine translation
Leirvik, Austin (The University of Edinburgh, 2012-11-28)This project investigates the translation of verb phrase anaphora within the source text of an English to French statistical machine translation system. VP anaphora is a common syntactic construction in English, although ... -
One translation fits all? A comparative analysis of British, American and transatlantic translations of Astrid Lindgren and Sven Nordqvist
Goodwin-Andersson, Elizabeth Margaret (The University of Edinburgh, 2016-11-24)Target culture is a concept regularly used in Translation Studies but it is not a concept which is routinely defined any further than the geographical location of the target language. In English translation this can be ... -
Toward a Deleuzean theory of translation: a translation of and commentary on A fuego eterno condenados
Kelly, James Christopher (The University of Edinburgh, 2016-06-30)This translation and commentary thesis presents a theory of literary translation based on the ideas of Gilles Deleuze, informed by and applied to a translation of parts 0, 1 and 2 of the novel A fuego eterno condenados ...