Edinburgh Research Archive

Syntactic and semantic features for statistical and neural machine translation

dc.contributor.advisor
Koehn, Philipp
en
dc.contributor.advisor
Birch-Mayne, Alexandra
en
dc.contributor.author
Nădejde, Maria
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2018-07-17T12:51:19Z
dc.date.available
2018-07-17T12:51:19Z
dc.date.issued
2018-07-02
dc.description.abstract
Machine Translation (MT) for language pairs with long distance dependencies and word reordering, such as German–English, is prone to producing output that is lexically or syntactically incoherent. Statistical MT (SMT) models used explicit or latent syntax to improve reordering, however failed at capturing other long distance dependencies. This thesis explores how explicit sentence-level syntactic information can improve translation for such complex linguistic phenomena. In particular, we work at the level of the syntactic-semantic interface with representations conveying the predicate-argument structures. These are essential to preserving semantics in translation and SMT systems have long struggled to model them. String-to-tree SMT systems use explicit target syntax to handle long-distance reordering, but make strong independence assumptions which lead to inconsistent lexical choices. To address this, we propose a Selectional Preferences feature which models the semantic affinities between target predicates and their argument fillers using the target dependency relations available in the decoder. We found that our feature is not effective in a string-to-tree system for German→English and that often the conditioning context is wrong because of mistranslated verbs. To improve verb translation, we proposed a Neural Verb Lexicon Model (NVLM) incorporating sentence-level syntactic context from the source which carries relevant semantic information for verb disambiguation. When used as an extra feature for re-ranking the output of a German→ English string-to-tree system, the NVLM improved verb translation precision by up to 2.7% and recall by up to 7.4%. While the NVLM improved some aspects of translation, other syntactic and lexical inconsistencies are not being addressed by a linear combination of independent models. In contrast to SMT, neural machine translation (NMT) avoids strong independence assumptions thus generating more fluent translations and capturing some long-distance dependencies. Still, incorporating additional linguistic information can improve translation quality. We proposed a method for tightly coupling target words and syntax in the NMT decoder. To represent syntax explicitly, we used CCG supertags, which encode subcategorization information, capturing long distance dependencies and attachments. Our method improved translation quality on several difficult linguistic constructs, including prepositional phrases which are the most frequent type of predicate arguments. These improvements over a strong baseline NMT system were consistent across two language pairs: 0.9 BLEU for German→English and 1.2 BLEU for Romanian→English.
en
dc.identifier.uri
http://hdl.handle.net/1842/31346
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Alexandra Birch, Barry Haddow, Ulrich Germann, Maria N˘adejde, Christian Buck, and Philipp Koehn. The feasibility of HMEANT as a human MT evaluation metric. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 52–61, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W13-2203.
en
dc.relation.hasversion
Maria N˘adejde, Philip Williams, and Philipp Koehn. Edinburgh’s Syntax-Based Machine Translation Systems. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 170–176, Sofia, Bulgaria, August 2013.
en
dc.relation.hasversion
Maria N˘adejde, Alexandra Birch, and Philipp Koehn. Modeling selectional preferences of verbs and nouns in string-to-tree machine translation. In Proceedings of the First Conference on Machine Translation, pages 32–42, Berlin, Germany, August 2016a. Association for Computational Linguistics. URL http: //www.aclweb.org/anthology/W/W16/W16-2204.
en
dc.relation.hasversion
Maria N˘adejde, Alexandra Birch, and Philipp Koehn. A neural verb lexicon model with source-side syntactic context for string-to-tree machine translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), December 2016b.
en
dc.relation.hasversion
Maria N˘adejde, Reddy Siva, Rico Sennrich, Tomasz Dwojak, Marcin Junczys- Dowmunt, Philipp Koehn, and Alexandra Birch. Predicting target language ccg supertags improves neural machine translation. In Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.
en
dc.relation.hasversion
Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel L¨aubli, Antonio Valerio Miceli Barone, Jozef Mokry, and Maria N˘adejde. Nematus: a toolkit for neural machine translation. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 65–68, Valencia, Spain, April 2017. Association for Computational Linguistics. URL http://aclweb.org/anthology/E17-3017.
en
dc.relation.hasversion
Philip Williams, Rico Sennrich, Maria N˘adejde, Matthias Huck, Eva Hasler, and Philipp Koehn. Edinburgh’s syntax-based systems at wmt 2014. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 207–214, Baltimore, Maryland, USA, June 2014.
en
dc.relation.hasversion
Philip Williams, Rico Sennrich, Maria N˘adejde, Matthias Huck, and Philipp Koehn. Edinburgh’s syntax-based systems at wmt 2015. In Proceedings of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, September 2015.
en
dc.relation.hasversion
Philip Williams, Rico Sennrich, Maria N˘adejde, Matthias Huck, Barry Haddow, and Ondˇrej Bojar. Edinburgh’s statistical machine translation systems for wmt16. In Proceedings of the First Conference on Machine Translation, pages 399–410, Berlin, Germany, August 2016. Association for Computational Linguistics.
en
dc.subject
Machine Translation
en
dc.subject
sentence-level syntactic context
en
dc.subject
syntactic-semantic interface
en
dc.subject
Selectional Preferences
en
dc.subject
syntax-based statistical MT system
en
dc.subject
Verb Lexicon model
en
dc.subject
neural networks
en
dc.title
Syntactic and semantic features for statistical and neural machine translation
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Nădejde2018.pdf
Size:
1.09 MB
Format:
Adobe Portable Document Format

This item appears in the following Collection(s)