Show simple item record

dc.contributor.advisorKoehn, Philippen
dc.contributor.advisorGoldwater, Sharonen
dc.contributor.authorWilliams, Philip Jamesen
dc.date.accessioned2015-02-26T14:33:21Z
dc.date.available2015-02-26T14:33:21Z
dc.date.issued2014-11-27
dc.identifier.urihttp://hdl.handle.net/1842/9971
dc.description.abstractMorphology and syntax have both received attention in statistical machine translation research, but they are usually treated independently and the historical emphasis on translation into English has meant that many morphosyntactic issues remain under-researched. Languages with richer morphologies pose additional problems and conventional approaches tend to perform poorly when either source or target language has rich morphology. In both computational and theoretical linguistics, feature structures together with the associated operation of unification have proven a powerful tool for modelling many morphosyntactic aspects of natural language. In this thesis, we propose a framework that extends a state-of-the-art syntax-based model with a feature structure lexicon and unification-based constraints on the target-side of the synchronous grammar. Whilst our framework is language-independent, we focus on problems in the translation of English to German, a language pair that has a high degree of syntactic reordering and rich target-side morphology. We first apply our approach to modelling agreement and case government phenomena. We use the lexicon to link surface form words with grammatical feature values, such as case, gender, and number, and we use constraints to enforce feature value identity for the words in agreement and government relations. We demonstrate improvements in translation quality of up to 0.5 BLEU over a strong baseline model. We then examine verbal complex production, another aspect of translation that requires the coordination of linguistic features over multiple words, often with long-range discontinuities. We develop a feature structure representation of verbal complex types, using constraint failure as an indicator of translation error and use this to automatically identify and quantify errors that occur in our baseline system. A manual analysis and classification of errors informs an extended version of the model that incorporates information derived from a parse of the source. We identify clause spans and use model features to encourage the generation of complete verbal complex types. We are able to improve accuracy as measured using precision and recall against values extracted from the reference test sets. Our framework allows for the incorporation of rich linguistic information and we present sketches of further applications that could be explored in future work.en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.contributor.sponsorEuropean Union Seventh Framework Programmeen
dc.language.isoen
dc.publisherThe University of Edinburghen
dc.relation.hasversionPhilip Williams and Philipp Koehn. Agreement Constraints for Statistical Machine Translation into German. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 217–226, Edinburgh, Scotland, July 2011. Association for Computational Linguistics.en
dc.relation.hasversionPhilip Williams and Philipp Koehn. GHKM Rule Extraction and Scope-3 Parsing in Moses. In Proceedings of the Seventh Workshop on Statistical Machine Translation, pages 388–394, Montréal, Canada, June 2012. Association for Computational Linguistics.en
dc.relation.hasversionPhilip Williams and Philipp Koehn. Using Feature Structures to Improve Verb Translation in English-to-German Statistical MT. In Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra), pages 21–29, Gothenburg, Sweden, April 2014. Association for Computational Linguistics.en
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subjectstatistical machine translationen
dc.subjectmorphologyen
dc.subjectsyntactic reorderingen
dc.titleUnification-based constraints for statistical machine translationen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International