Towards Statistical Machine Translation with Unification Grammars
Item statusRestricted Access
Traditional Statistical Machine Translation (SMT) models account poorly for many linguistic phenomena, such as subject-verb agreement and differences in word-order between languages. Recent work, such as that in factored phrase-based models, has shown promising improvements in translation quality through the use of linguistically-richer models. Uniﬁcation-based approaches to grammar offer a framework for modelling agreement, a particular problem in generating morphologically-rich languages, and so in order to gauge the potential gains available from their application to SMT we ﬁrst consider how to automatically recognise and measure agreement failure. We focus upon the speciﬁc issue of declension in German noun phrases and propose a simple uniﬁcation-based approach to the problem. We develop an agreement checker based on this approach and use it to assess the agreement failure rate of a hierachical phrase-based translation system trained on the small News Commentary corpus. Initially we ﬁnd that our checker reports unreasonably high failure rates on the ﬂuent training data, and through an incremental process of failure analysis and lexicon reﬁnement we signiﬁcantly reduce the number of spurious failures. We then apply the agreement checker directly to machine translation by incorporating it as a feature function of the log-linear model. We train our baseline system on the larger Europarl corpus and again measure failure rates before applying the agreement check as both a hard and soft constraint. The effects on translation are not large enough to reliably measure using standard automatic evaluation techniques and so we perform a manual analysis of the types of change introduced.