Show simple item record

dc.contributor.advisorKoehn, Philipp
dc.contributor.advisorLopez, Adam
dc.contributor.authorAuli, Michael
dc.date.accessioned2013-08-05T14:31:53Z
dc.date.available2013-08-05T14:31:53Z
dc.date.issued2012-11-29
dc.identifier.urihttp://hdl.handle.net/1842/7636
dc.descriptionEuroMatrixPlus project funded by the European Commission, 7th Framework Programme
dc.description.abstractParsing is the task of assigning syntactic or semantic structure to a natural language sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar (CCG; Steedman 2000). CCG allows incremental processing, which is essential for speech recognition and some machine translation models, and it can build semantic structure in tandem with syntactic parsing. Supertagging solves a subset of the parsing task by assigning lexical types to words in a sentence using a sequence model. It has emerged as a way to improve the efficiency of full CCG parsing (Clark and Curran, 2007) by reducing the parser’s search space. This has been very successful and it is the central theme of this thesis. We begin by an analysis of how efficiency is being traded for accuracy in supertagging. Pruning the search space by supertagging is inherently approximate and to contrast this we include A* in our analysis, a classic exact search technique. Interestingly, we find that combining the two methods improves efficiency but we also demonstrate that excessive pruning by a supertagger significantly lowers the upper bound on accuracy of a CCG parser. Inspired by this analysis, we design a single integrated model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting complexity, we experiment with both loopy belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. Finally, we address training the integrated model. We adopt the idea of optimising directly for a task-specific metric such as is common in other areas like statistical machine translation. We demonstrate how a novel dynamic programming algorithm enables us to optimise for F-measure, our task-specific evaluation metric, and experiment with approximations, which prove to be excellent substitutions. Each of the presented methods improves over the state-of-the-art in CCG parsing. Moreover, the improvements are additive, achieving a labelled/unlabelled dependency F-measure on CCGbank of 89.3%/94.0% with gold part-of-speech tags, and 87.2%/92.8% with automatic part-of-speech tags, the best reported results for this task to date. Our techniques are general and we expect them to apply to other parsing problems, including lexicalised tree adjoining grammar and context-free grammar parsing.en_US
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en_US
dc.contributor.sponsorEuropean Commission
dc.language.isoenen_US
dc.publisherThe University of Edinburghen_US
dc.relation.hasversionAuli, M. (2009). CCG-based Models for Statistical Machine Translation. First-Year PhD Report. Available from http://homepages.inf.ed.ac.uk/s0453934.en_US
dc.relation.hasversionAuli, M. and Lopez, A. (2011). A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing. In Proc. of ACL, pages 470–480, Portland, Oregon, USA.en_US
dc.relation.hasversionAuli, M. and Lopez, A. (2011). Efficient CCG Parsing: A* versus Supertagging. In Proc. of ACL, pages 1577–1585, Portland, Oregon, USA.en_US
dc.relation.hasversionAuli, M. and Lopez, A. (2011). Training a Log-Linear Parser with Loss Functions via Softmax-Margin. In Proc. of EMNLP, pages 333–343, Edinburgh, Scotland, UK.en_US
dc.subjectnatural language processingen_US
dc.subjectparsingen_US
dc.subjectCombinatory Categorial Grammaren_US
dc.subjectSupertaggingen_US
dc.titleIntegrated supertagging and parsingen_US
dc.typeThesis or Dissertationen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record