Towards Statistical Machine Translation with Unification Grammars
View/ Open
Date
26/11/2009Item status
Restricted AccessAuthor
Williams, Philip
Metadata
Abstract
Traditional Statistical Machine Translation (SMT) models account poorly for many linguistic phenomena, such as subject-verb agreement and differences in word-order between languages. Recent work, such as that in factored phrase-based models, has shown promising improvements in translation quality through the use of linguistically-richer models. Unification-based approaches to grammar offer a framework for modelling agreement, a particular problem in generating morphologically-rich languages, and so in order to gauge the potential gains available from their application to SMT we first consider how to automatically recognise and measure agreement failure. We focus upon the specific issue of declension in German noun phrases and propose a simple unification-based approach to the problem. We develop an agreement checker based on this approach and use it to assess the agreement failure rate of a hierachical phrase-based translation system trained on the small News Commentary corpus. Initially we find that our checker reports unreasonably high failure rates on the fluent training data, and through an incremental process of failure analysis and lexicon refinement we significantly reduce the number of spurious failures. We then apply the agreement checker directly to machine translation by incorporating it as a feature function of the log-linear model. We train our baseline system on the larger Europarl corpus and again measure failure rates before applying the agreement check as both a hard and soft constraint. The effects on translation are not large enough to reliably measure using standard automatic evaluation techniques and so we perform a manual analysis of the types of change introduced.