dc.contributor.advisor | Steedman, Mark | |
dc.contributor.advisor | Lapata, Maria | |
dc.contributor.author | Weber, Sabine | |
dc.date.accessioned | 2022-10-04T10:43:44Z | |
dc.date.available | 2022-10-04T10:43:44Z | |
dc.date.issued | 2022-10-04 | |
dc.identifier.uri | https://hdl.handle.net/1842/39405 | |
dc.identifier.uri | http://dx.doi.org/10.7488/era/2655 | |
dc.description.abstract | Recognizing textual entailment is an important prerequisite to many tasks in NLP, e.g. question answering and semantic parsing. Knowing that for example buying a thing entails subsequently owning it is a relation that humans learn by interacting with the world, while machines need other ways to acquire this knowledge. Previous approaches at learning predicate entailment relations from text have focused only on English. In this thesis we present the adaptation of the unsupervised entailment graph building algorithm of Hosseini et al. to German, which can be seen as a study of challenges in language adaptation for this task in general. We create a variety of German tools necessary for this approach and give a detailed account of the challenges faced and the insights gained from them.
First, we create a German relation extraction system and compare it against the English system presented by Hosseini et al. Finding that the typing of German entities constitutes a bottleneck, we create German fine-grained typing system for named and general entities. In doing so we examine the methods of annotation projection and zero-shot cross-lingual transfer, finding that for German fine-grained named entity typing zero-shot cross-lingual transfer performs best. We then move on to creating a German system that types general entities (e.g. ``ex-president'') as well as named entities (e.g. ``Obama''), by augmenting our training data with data automatically generated from a German WordNet. We find that this way up to 10 percent points improvement in general entity typing performance can be reached, while only slightly impacting named entity typing performance by 1 percent point. We use these components in the pipeline to construct German entailment graphs.
We also present a method that uses German and English entailment graphs to generate training data for a supervised predicate entailment detection system, and show that this method outperforms current approaches at this task. This way we create a multilingual predicate entailment detection system, that outperforms both the monolingual German system and the zero-shot cross-lingual system on German test data, and also performs better than a monolingual English system on English test data. | en |
dc.language.iso | en | en |
dc.publisher | The University of Edinburgh | en |
dc.relation.hasversion | Li, T., Weber, S., Hosseini, M. J., Guillou, L., and Steedman, M. (2022). Cross-lingual inference with a chinese entailment graph. In Proceedings of the Society for Computation in Linguistics. | en |
dc.relation.hasversion | Weber, S. and Steedman, M. (2019). Construction and alignment of multilingual entailment graphs for semantic inference. In Proceedings of the 2019 Workshop on Widening NLP, pages 77–79. | en |
dc.relation.hasversion | Weber, S. and Steedman, M. (2021). Fine-grained general entity typing in german using germanet. In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15), pages 138–143. | en |
dc.relation.hasversion | Weber, S. and Steedman, M. (2021). Zero-shot cross-lingual transfer is a hard baseline to beat in German fine-grained entity typing. In Proceedings of the Second Workshop on Insights from Negative Results in NLP, pages 42–48, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics | en |
dc.relation.hasversion | The code relating to German Fine-Grained Entity Typing can be found under https://github.com/webersab/german general entity typing. | en |
dc.relation.hasversion | The code of the Relation Extraction Pipeline can be found under https://github.com/webersab/relationExtractionPipeline. | en |
dc.subject | Natural Language Processing | en |
dc.subject | distributional inclusion hypothesis | en |
dc.subject | NLP | en |
dc.subject | German language | en |
dc.title | Unsupervised German predicate entailment using the distributional inclusion hypothesis | en |
dc.type | Thesis or Dissertation | en |
dc.type.qualificationlevel | Doctoral | en |
dc.type.qualificationname | PhD Doctor of Philosophy | en |