Unsupervised German predicate entailment using the distributional inclusion hypothesis
Recognizing textual entailment is an important prerequisite to many tasks in NLP, e.g. question answering and semantic parsing. Knowing that for example buying a thing entails subsequently owning it is a relation that humans learn by interacting with the world, while machines need other ways to acquire this knowledge. Previous approaches at learning predicate entailment relations from text have focused only on English. In this thesis we present the adaptation of the unsupervised entailment graph building algorithm of Hosseini et al. to German, which can be seen as a study of challenges in language adaptation for this task in general. We create a variety of German tools necessary for this approach and give a detailed account of the challenges faced and the insights gained from them. First, we create a German relation extraction system and compare it against the English system presented by Hosseini et al. Finding that the typing of German entities constitutes a bottleneck, we create German fine-grained typing system for named and general entities. In doing so we examine the methods of annotation projection and zero-shot cross-lingual transfer, finding that for German fine-grained named entity typing zero-shot cross-lingual transfer performs best. We then move on to creating a German system that types general entities (e.g. ``ex-president'') as well as named entities (e.g. ``Obama''), by augmenting our training data with data automatically generated from a German WordNet. We find that this way up to 10 percent points improvement in general entity typing performance can be reached, while only slightly impacting named entity typing performance by 1 percent point. We use these components in the pipeline to construct German entailment graphs. We also present a method that uses German and English entailment graphs to generate training data for a supervised predicate entailment detection system, and show that this method outperforms current approaches at this task. This way we create a multilingual predicate entailment detection system, that outperforms both the monolingual German system and the zero-shot cross-lingual system on German test data, and also performs better than a monolingual English system on English test data.