Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Dynamic topic adaptation for improved contextual modelling in statistical machine translation

View/Open
Hasler2015.pdf (1.271Mb)
Date
29/06/2015
Author
Hasler, Eva Cornelia
Metadata
Show full item record
Abstract
In recent years there has been an increased interest in domain adaptation techniques for statistical machine translation (SMT) to deal with the growing amount of data from different sources. Topic modelling techniques applied to SMT are closely related to the field of domain adaptation but more flexible in dealing with unstructured text. Topic models can capture latent structure in texts and are therefore particularly suitable for modelling structure in between and beyond corpus boundaries, which are often arbitrary. In this thesis, the main focus is on dynamic translation model adaptation to texts of unknown origin, which is a typical scenario for an online MT engine translating web documents. We introduce a new bilingual topic model for SMT that takes the entire document context into account and for the first time directly estimates topic-dependent phrase translation probabilities in a Bayesian fashion. We demonstrate our model’s ability to improve over several domain adaptation baselines and further provide evidence for the advantages of bilingual topic modelling for SMT over the more common monolingual topic modelling. We also show improved performance when deriving further adapted translation features from the same model which measure different aspects of topical relatedness. We introduce another new topic model for SMT which exploits the distributional nature of phrase pair meaning by modelling topic distributions over phrase pairs using their distributional profiles. Using this model, we explore combinations of local and global contextual information and demonstrate the usefulness of different levels of contextual information, which had not been previously examined for SMT. We also show that combining this model with a topic model trained at the document-level further improves performance. Our dynamic topic adaptation approach performs competitively in comparison with two supervised domain-adapted systems. Finally, we shed light on the relationship between domain adaptation and topic adaptation and propose to combine multi-domain adaptation and topic adaptation in a framework that entails automatic prediction of domain labels at the document level. We show that while each technique provides complementary benefits to the overall performance, there is an amount of overlap between domain and topic adaptation. This can be exploited to build systems that require less adaptation effort at runtime.
URI
http://hdl.handle.net/1842/10522
Collections
  • Informatics thesis and dissertation collection

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page