Show simple item record

dc.contributor.advisorSennrich, Rico
dc.contributor.advisorTitov, Ivan
dc.contributor.authorZhang, Biao
dc.date.accessioned2022-08-03T10:29:18Z
dc.date.available2022-08-03T10:29:18Z
dc.date.issued2022-08-03
dc.identifier.urihttps://hdl.handle.net/1842/39298
dc.identifier.urihttp://dx.doi.org/10.7488/era/2549
dc.description.abstractHumans benefit from communication but suffer from language barriers. Machine translation (MT) aims to overcome such barriers by automatically transforming information from one language to another. With the rapid development of deep neural networks, neural machine translation (NMT) – especially Transformer – has achieved great success in recent years, delivering state-of-the-art and even near human performance on many bilingual text-based translation tasks. However, challenges remain particularly in 1) efficiency where a massive NMT model is a computational bottleneck for training and decoding, and 2) universality where extending NMT beyond bilingual and text-based scenarios (such as multilingual and speech-to-text translation) is still non-trivial. In this thesis, we investigate ways of developing simple and effective neural architectures to address these two challenges. NMT is resource-hungry. Achieving high-quality translation demands complex network architectures and a large number of model parameters, which often takes hundreds or even thousands of training GPU hours and leads to slow inference. We tackle this computational inefficiency issue via three aspects: 1) simplifying model architectures, where we propose a lightweight recurrent network and root mean square layer normalization to enable higher model parallelization, as well as a merged attention network paired with depth-scaled initialization to improve deep Transformer; 2) exploring representation redundancy, where we demonstrate the feasibility of sparsifying encoder outputs in Transformer and propose a rectified linear attention to induce sparse attention weights efficiently; and 3) semi-autoregressive modeling, where we relax the independence assumption by allowing generation from the left-to-right and right-to-left directions simultaneously. Apart from benefiting efficiency, these techniques also lay the foundation for our research on universality, another topic of this thesis. MT should be universal, i.e., transforming information between any languages in any modalities. Unfortunately, NMT still struggles with poor language coverage and cross-modality gap. As a step towards universality, we focus on (massively) multilingual NMT and direct speech-to-text translation (ST). Multilingual NMT suffers from capacity bottleneck and off-target translation; thus we study methods of increasing modeling capacity for multilingual Transformer, and propose random online backtranslation to bridge zero-short language pairs. We further explore when and where language-specific (LS) modeling matters via conditional LS routing, discovering the trade-off between shared and LS capacity. Unlike textual NMT, the modality gap between speech and text hinders ST. We narrow this gap by inventing adaptive feature selection, which automatically filters out uninformative speech features, improving translation as well as inference speed. Next, we extend our study to document-level ST to address the question whether and how context helps ST. We adopt contextual modeling for ST, and show its effectiveness on enhancing homophone and simultaneous translation. Universality covers multilinguality and multimodality. Finally, we discuss multilingual ST, a critical path to universal NMT. We integrate the above methods into a joint model and participate in the multilingual ST shared task in IWSLT2021. Our system achieves competitive performance in both supervised and zero-shot translation, where we observe the complementarity of different techniques in improving multilingual ST.en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionZhang, B, Bapna, A, Sennrich, R & Firat, O 2021, Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation. in International Conference on Learning Representations (ICLR 2021). Ninth International Conference on Learning Representations 2021, Virtual Conference, 4/05/21. <https://openreview.net/forum?id=Wj4ODo0uyCF>en
dc.relation.hasversionBiao Zhang and Rico Sennrich. 2021. Edinburgh’s End-to-End Multilingual Speech Translation System for IWSLT 2021. In Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), pages 160–168, Bangkok, Thailand (online). Association for Computational Linguistics. https://aclanthology.org/2021.iwslt-1.19en
dc.relation.hasversionSennrich, R & Zhang, B 2019, Revisiting Low-Resource Neural Machine Translation: A Case Study. in A Korhonen, D Traum & L Màrquez (eds), Proceedings of the 57th Conference of the Association for Computational Linguistics., P19-1021, Association for Computational Linguistics (ACL), Florence, Italy, pp. 211–221, 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28/07/19. <https://www.aclweb.org/anthology/papers/P/P19/P19-1021/>en
dc.relation.hasversionZhang, B, Titov, I & Sennrich, R 2020, Fast Interleaved Bidirectional Sequence Generation. in Proceedings of the Fifth Conference on Machine Translation. Association for Computational Linguistics, Online, pp. 501- 513, Fifth Conference on Machine Translation, Online Conference, 19/11/20. <https://www.aclweb.org/anthology/2020.wmt-1.62en
dc.relation.hasversionBiao Zhang, Philip Williams, Ivan Titov, and Rico Sennrich. 2020. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, Online. Association for Computational Linguistics.en
dc.relation.hasversionZhang, B, Titov, I, Haddow, B & Sennrich, R 2020, Adaptive Feature Selection for End-to-End Speech Translation. in Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics (ACL), pp. 2533-2544, The 2020 Conference on Empirical Methods in Natural Language Processing, Virtual conference, 16/11/20. https://doi.org/10.18653/v1/2020.findings-emnlp.230en
dc.relation.hasversionRachel Bawden, Biao Zhang, Lisa Yankovskaya, Andre Tättar, and Matt Post. 2020. A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 918–932, Online. Association for Computational Linguistics.en
dc.relation.hasversionBiao Zhang, Ivan Titov, and Rico Sennrich. 2021. On Sparsifying Encoder Outputs in Sequence-to-Sequence Models. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2888–2900, Online. Association for Computational Linguistics.en
dc.relation.hasversionBiao Zhang, Ivan Titov, and Rico Sennrich. 2021. Sparse Attention with Linear Units. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6507–6520, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.en
dc.relation.hasversionBiao Zhang, Ivan Titov, Barry Haddow, and Rico Sennrich. 2021. Beyond Sentence-Level End-to-End Speech Translation: Context Helps. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2566–2578, Online. Association for Computational Linguistics.en
dc.relation.hasversionZhang, B & Sennrich, R 2019, Root Mean Square Layer Normalization. in Advances in Neural Information Processing Systems 32. vol. 32, Curran Associates Inc, pp. 12360-12371, 33rd Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 8/12/19. <https://papers.nips.cc/paper/9403-root-mean-square-layer-normalization>en
dc.relation.hasversionBawden, R, Zhang, B, Tättar, A & Post, M 2020, ParBLEU: Augmenting Metrics with Automatic Paraphrasesfor the WMT’20 Metrics Shared Task. in Proceedings of the Fifth Conference on Machine Translation. Association for Computational Linguistics (ACL), pp. 887-894, Fifth Conference on Machine Translation, Online Conference, 19/11/20. <https://www.aclweb.org/anthology/2020.wmt-1.98en
dc.relation.hasversionZhang, B & Sennrich, R 2019, A Lightweight Recurrent Network for Sequence Modeling. in A Korhonen, D Traum & L Màrquez (eds), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics., P19-1149, Association for Computational Linguistics (ACL), Florence, Italy, pp. 1538–1548, 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28/07/19. <https://www.aclweb.org/anthology/papers/P/P19/P19-1149/>en
dc.relation.hasversionZhang, B, Titov, I & Sennrich, R 2019, Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics (ACL), pp. 898–909, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, Hong Kong, 3/11/19. https://doi.org/10.18653/v1/D19-1083en
dc.subjectMachine Translationen
dc.subjectNatural Language Processingen
dc.subjectNeural Machine Translationen
dc.subjectMultilingual Translationen
dc.subjectEfficient Transformeren
dc.subjectSpeech-to-text Translationen
dc.subjectDocument-level Translationen
dc.subjectUniversal Machine Translationen
dc.titleTowards efficient universal neural machine translationen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record