Wide-coverage statistical parsing with minimalist grammars
Torr, John Philip
Syntactic parsing is the process of automatically assigning a structure to a string of words, and is arguably a necessary prerequisite for obtaining a detailed and precise representation of sentence meaning. For many NLP tasks, it is sufficient to use parsers based on simple context free grammars. However, for tasks in which precision on certain relatively rare but semantically crucial constructions (such as unbounded wh-movements for open domain question answering) is important, more expressive grammatical frameworks still have an important role to play. One grammatical framework which has been conspicuously absent from journals and conferences on Natural Language Processing (NLP), despite continuing to dominate much of theoretical syntax, is Minimalism, the latest incarnation of the Transformational Grammar (TG) approach to linguistic theory developed very extensively by Noam Chomsky and many others since the early 1950s. Until now, all parsers using genuine transformational movement operations have had only narrow coverage by modern standards, owing to the lack of any wide-coverage TG grammars or treebanks on which to train statistical models. The received wisdom within NLP is that TG is too complex and insufficiently formalised to be applied to realistic parsing tasks. This situation is unfortunate, as it is arguably the most extensively developed syntactic theory across the greatest number of languages, many of which are otherwise under-resourced, and yet the vast majority of its insights never find their way into NLP systems. Conversely, the process of constructing large grammar fragments can have a salutary impact on the theory itself, forcing choices between competing analyses of the same construction, and exposing incompatibilities between analyses of different constructions, along with areas of over- and undergeneration which may otherwise go unnoticed. This dissertation builds on research into computational Minimalism pioneered by Ed Stabler and others since the late 1990s to present the first ever wide-coverage Minimalist Grammar (MG) parser, along with some promising initial experimental results. A wide-coverage parser must of course be equipped with a wide-coverage grammar, and this dissertation will therefore also present the first ever wide-coverage MG, which has analyses with a high level of cross-linguistic descriptive adequacy for a great many English constructions, many of which are taken or adapted from proposals in the mainstream Minimalist literature. The grammar is very deep, in the sense that it describes many long-range dependencies which even most other expressive wide-coverage grammars ignore. At the same time, it has also been engineered to be highly constrained, with continuous computational testing being applied to minimize both under- and over-generation. Natural language is highly ambiguous, both locally and globally, and even with a very strong formal grammar, there may still be a great many possible structures for a given sentence and its substrings. The standard approach to resolving such ambiguity is to equip the parser with a probability model allowing it to disregard certain unlikely search paths, thereby increasing both its efficiency and accuracy. The most successful parsing models are those extracted in a supervised fashion from labelled data in the form of a corpus of syntactic trees, known as a treebank. Constructing such a treebank from scratch for a different formalism is extremely time-consuming and expensive, however, and so the standard approach is to map the trees in an existing treebank into trees of the target formalism. Minimalist trees are considerably more complex than those of other formalisms, however, containing many more null heads and movement operations, making this conversion process far from trivial. This dissertation will describe a method which has so far been used to convert 56% of the Penn Treebank trees into MG trees. Although still under development, the resulting MGbank corpus has already been used to train a statistical A* MG parser, described here, which has an expected asymptotic time complexity of O(n3); this is much better than even the most optimistic worst case analysis for the formalism.