Wide-coverage parsing for Turkish
dc.contributor.advisor
Steedman, Mark
en
dc.contributor.advisor
Osborne, Miles
en
dc.contributor.author
Çakici, Ruket
en
dc.date.accessioned
2010-10-04T09:41:04Z
dc.date.available
2010-10-04T09:41:04Z
dc.date.issued
2009
dc.description.abstract
Wide-coverage parsing is an area that attracts much attention in natural language processing
research. This is due to the fact that it is the first step tomany other applications
in natural language understanding, such as question answering.
Supervised learning using human-labelled data is currently the best performing
method. Therefore, there is great demand for annotated data. However, human annotation
is very expensive and always, the amount of annotated data is much less than
is needed to train well-performing parsers. This is the motivation behind making the
best use of data available. Turkish presents a challenge both because syntactically
annotated Turkish data is relatively small and Turkish is highly agglutinative, hence
unusually sparse at the whole word level.
METU-Sabancı Treebank is a dependency treebank of 5620 sentences with surface
dependency relations and morphological analyses for words. We show that including
even the crudest forms of morphological information extracted from the data boosts
the performance of both generative and discriminative parsers, contrary to received
opinion concerning English.
We induce word-based and morpheme-based CCG grammars from Turkish dependency
treebank. We use these grammars to train a state-of-the-art CCG parser that
predicts long-distance dependencies in addition to the ones that other parsers are capable
of predicting. We also use the correct CCG categories as simple features in a
graph-based dependency parser and show that this improves the parsing results.
We show that a morpheme-based CCG lexicon for Turkish is able to solve many
problems such as conflicts of semantic scope, recovering long-range dependencies,
and obtaining smoother statistics from the models. CCG handles linguistic phenomena
i.e. local and long-range dependencies more naturally and effectively than other linguistic
theories while potentially supporting semantic interpretation in parallel. Using
morphological information and a morpheme-cluster based lexicon improve the performance
both quantitatively and qualitatively for Turkish.
We also provide an improved version of the treebank which will be released by
kind permission of METU and Sabancı.
en
dc.identifier.uri
http://hdl.handle.net/1842/3807
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.subject
combinatory categorial grammar
en
dc.subject
CCG
en
dc.subject
parsing
en
dc.subject
natural language processing
en
dc.subject
morphology
en
dc.subject
syntax
en
dc.title
Wide-coverage parsing for Turkish
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Cakici2008.pdf
- Size:
- 2.14 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

