Edinburgh Research Archive

Topic indexing and retrieval for open domain factoid question answering

dc.contributor.advisor
Webber, Bonnie
en
dc.contributor.advisor
Bos, Johan
en
dc.contributor.author
Ahn, Kisuh
en
dc.date.accessioned
2010-09-30T15:27:25Z
dc.date.available
2010-09-30T15:27:25Z
dc.date.issued
2009
dc.description.abstract
Factoid Question Answering is an exciting area of Natural Language Engineering that has the potential to replace one major use of search engines today. In this dissertation, I introduce a new method of handling factoid questions whose answers are proper names. The method, Topic Indexing and Retrieval, addresses two issues that prevent current factoid QA system from realising this potential: They can’t satisfy users’ demand for almost immediate answers, and they can’t produce answers based on evidence distributed across a corpus. The first issue arises because the architecture common to QA systems is not easily scaled to heavy use because so much of the work is done on-line: Text retrieved by information retrieval (IR) undergoes expensive and time-consuming answer extraction while the user awaits an answer. If QA systems are to become as heavily used as popular web search engines, this massive process bottle-neck must be overcome. The second issue of how to make use of the distributed evidence in a corpus is relevant when no single passage in the corpus provides sufficient evidence for an answer to a given question. QA systems commonly look for a text span that contains sufficient evidence to both locate and justify an answer. But this will fail in the case of questions that require evidence from more than one passage in the corpus. Topic Indexing and Retrieval method developed in this thesis addresses both these issues for factoid questions with proper name answers by restructuring the corpus in such a way that it enables direct retrieval of answers using off-the-shelf IR. The method has been evaluated on 377 TREC questions with proper name answers and 41 questions that require multiple pieces of evidence from different parts of the TREC AQUAINT corpus. With regards to the first evaluation, scores of 0.340 in Accuracy and 0.395 in Mean Reciprocal Rank (MRR) show that the Topic Indexing and Retrieval performs well for this type of questions. A second evaluation compares performance on a corpus of 41 multi-evidence questions by a question-factoring baseline method that can be used with the standard QA architecture and by my Topic Indexing and Retrieval method. The superior performance of the latter (MRR of 0.454 against 0.341) demonstrates its value in answering such questions.
en
dc.identifier.uri
http://hdl.handle.net/1842/3794
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.subject
question answering
en
dc.subject
natural language engineering
en
dc.title
Topic indexing and retrieval for open domain factoid question answering
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Ahn2009.pdf
Size:
1.21 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)