Transition-based combinatory categorial grammar parsing for English and Hindi
dc.contributor.advisor
Steedman, Mark
en
dc.contributor.advisor
Deoskar, Tejaswini
en
dc.contributor.author
Ambati, Bharat Ram
en
dc.contributor.sponsor
European Commission
en
dc.date.accessioned
2017-02-22T15:40:35Z
dc.date.available
2017-02-22T15:40:35Z
dc.date.issued
2016-11-29
dc.description.abstract
Given a natural language sentence, parsing is the task of assigning it a grammatical
structure, according to the rules within a particular grammar formalism. Different
grammar formalisms like Dependency Grammar, Phrase Structure Grammar, Combinatory
Categorial Grammar, Tree Adjoining Grammar are explored in the literature for
parsing. For example, given a sentence like “John ate an apple”, parsers based on the
widely used dependency grammars find grammatical relations, such as that ‘John’ is
the subject and ‘apple’ is the object of the action ‘ate’. We mainly focus on Combinatory
Categorial Grammar (CCG) in this thesis.
In this thesis, we present an incremental algorithm for parsing CCG for two diverse
languages: English and Hindi. English is a fixed word order, SVO (Subject-Verb-
Object), and morphologically simple language, whereas, Hindi, though predominantly
a SOV (Subject-Object-Verb) language, is a free word order and morphologically rich
language. Developing an incremental parser for Hindi is really challenging since the
predicate needed to resolve dependencies comes at the end. As previously available
shift-reduce CCG parsers use English CCGbank derivations which are mostly right
branching and non-incremental, we design our algorithm based on the dependencies
resolved rather than the derivation. Our novel algorithm builds a dependency graph in
parallel to the CCG derivation which is used for revealing the unbuilt structure without
backtracking. Though we use dependencies for meaning representation and CCG for
parsing, our revealing technique can be applied to other meaning representations like
lambda expressions and for non-CCG parsing like phrase structure parsing.
Any statistical parser requires three major modules: data, parsing algorithm and
learning algorithm. This thesis is broadly divided into three parts each dealing with
one major module of the statistical parser. In Part I, we design a novel algorithm
for converting dependency treebank to CCGbank. We create Hindi CCGbank with a
decent coverage of 96% using this algorithm. We also do a cross-formalism experiment
where we show that CCG supertags can improve widely used dependency parsers.
We experiment with two popular dependency parsers (Malt and MST) for two diverse
languages: English and Hindi. For both languages, CCG categories improve the overall
accuracy of both parsers by around 0.3-0.5% in all experiments. For both parsers,
we see larger improvements specifically on dependencies at which they are known
to be weak: long distance dependencies for Malt, and verbal arguments for MST.
The result is particularly interesting in the case of the fast greedy parser (Malt), since
improving its accuracy without significantly compromising speed is relevant for large
scale applications such as parsing the web.
We present a novel algorithm for incremental transition-based CCG parsing for
English and Hindi, in Part II. Incremental parsers have potential advantages for applications
like language modeling for machine translation and speech recognition. We
introduce two new actions in the shift-reduce paradigm for revealing the required information
during parsing. We also analyze the impact of a beam and look-ahead for
parsing. In general, using a beam and/or look-ahead gives better results than not using
them. We also show that the incremental CCG parser is more useful than a non-incremental
version for predicting relative sentence complexity. Given a pair of sentences
from wikipedia and simple wikipedia, we build a classifier which predicts if one
sentence is simpler/complex than the other. We show that features from a CCG parser
in general and incremental CCG parser in particular are more useful than a chart-based
phrase structure parser both in terms of speed and accuracy.
In Part III, we develop the first neural network based training algorithm for parsing
CCG. We also study the impact of neural network based tagging models, and greedy
versus beam-search parsing, by using a structured neural network model. In greedy
settings, neural network models give significantly better results than the perceptron
models and are also over three times faster. Using a narrow beam, structured neural
network model gives consistently better results than the basic neural network model.
For English, structured neural network gives similar performance to structured perceptron
parser. But for Hindi, structured perceptron is still the winner.
en
dc.identifier.uri
http://hdl.handle.net/1842/20404
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Bharat Ram Ambati, Tejaswini Deoskar, and Mark Steedman. (2016). Shift- Reduce CCG Parsing using Neural Network Models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
en
dc.relation.hasversion
Bharat Ram Ambati, Siva Reddy, and Mark Steedman. (2016). Assessing Relative Sentence Complexity using an Incremental CCG Parser. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
en
dc.relation.hasversion
Bharat Ram Ambati, Tejaswini Deoskar, Mark Johnson and Mark Steedman. (2015). An Incremental Algorithm for Transition-based CCG Parsing. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 53-63, Denver, Colorado.
en
dc.relation.hasversion
Bharat Ram Ambati, Tejaswini Deoskar and Mark Steedman. (2014). Improving Dependency Parsers using Combinatory Categorial Grammar. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, pages 159-163, Gothenburg, Sweden.
en
dc.relation.hasversion
Bharat Ram Ambati, Tejaswini Deoskar and Mark Steedman. (2013). Using CCG categories to improve Hindi dependency parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 604-609, Sofia, Bulgaria.
en
dc.relation.hasversion
Ambati, B. R., Husain, S., Jain, S., Sharma, D. M., and Sangal, R. (2010). Two Methods to Incorporate ‘Local Morphosyntactic’ Features in Hindi Dependency Parsing. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, pages 22–30, Los Angeles, CA, USA. Association for Computational Linguistics.
en
dc.relation.hasversion
Ambati, B. R., Husain, S., Nivre, J., and Sangal, R. (2010). On the Role of Morphosyntactic Features in Hindi Dependency Parsing. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, pages 94–102, Los Angeles, CA, USA. Association for Computational Linguistics.
en
dc.relation.hasversion
Bharati, A., Husain, S., Ambati, B., Jain, S., Sharma, D. M., and Sangal, R. (2008). Two semantic features make all the difference in Parsing accuracy. In Proceedings of the 6th International Conference On Natural Language Processing (ICON), pages 11–19, Pune, India. Macmillan publishers India Ltd.
en
dc.relation.hasversion
Husain, S., Mannem, P., Ambati, B. R., and Gadde, P. (2010). The ICON-2010 Tools Contest on Indian Language Dependency Parsing. In Proceedings of ICON-2010 Tools Contest on Indian Language Dependency Parsing, Kharagpur, India.
en
dc.rights
Attribution-NonCommercial-ShareAlike 4.0 International
en
dc.rights.uri
http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject
CCG
en
dc.subject
combinatory categorial grammar
en
dc.subject
transition-based
en
dc.subject
parses
en
dc.title
Transition-based combinatory categorial grammar parsing for English and Hindi
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Ambati2016.pdf
- Size:
- 1.25 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

