Topic indexing and retrieval for open domain factoid question answering

Ahn, Kisuh

Topic indexing and retrieval for open domain factoid question answering

Simple item page

dc.contributor.advisor

Webber, Bonnie

en

dc.contributor.advisor

Bos, Johan

en

dc.contributor.author

Ahn, Kisuh

en

dc.date.accessioned

2010-09-30T15:27:25Z

dc.date.available

2010-09-30T15:27:25Z

dc.date.issued

2009

dc.description.abstract

Factoid Question Answering is an exciting area of Natural Language Engineering that has the potential to replace one major use of search engines today. In this dissertation, I introduce a new method of handling factoid questions whose answers are proper names. The method, Topic Indexing and Retrieval, addresses two issues that prevent current factoid QA system from realising this potential: They can’t satisfy users’ demand for almost immediate answers, and they can’t produce answers based on evidence distributed across a corpus. The first issue arises because the architecture common to QA systems is not easily scaled to heavy use because so much of the work is done on-line: Text retrieved by information retrieval (IR) undergoes expensive and time-consuming answer extraction while the user awaits an answer. If QA systems are to become as heavily used as popular web search engines, this massive process bottle-neck must be overcome. The second issue of how to make use of the distributed evidence in a corpus is relevant when no single passage in the corpus provides sufficient evidence for an answer to a given question. QA systems commonly look for a text span that contains sufficient evidence to both locate and justify an answer. But this will fail in the case of questions that require evidence from more than one passage in the corpus. Topic Indexing and Retrieval method developed in this thesis addresses both these issues for factoid questions with proper name answers by restructuring the corpus in such a way that it enables direct retrieval of answers using off-the-shelf IR. The method has been evaluated on 377 TREC questions with proper name answers and 41 questions that require multiple pieces of evidence from different parts of the TREC AQUAINT corpus. With regards to the first evaluation, scores of 0.340 in Accuracy and 0.395 in Mean Reciprocal Rank (MRR) show that the Topic Indexing and Retrieval performs well for this type of questions. A second evaluation compares performance on a corpus of 41 multi-evidence questions by a question-factoring baseline method that can be used with the standard QA architecture and by my Topic Indexing and Retrieval method. The superior performance of the latter (MRR of 0.454 against 0.341) demonstrates its value in answering such questions.

en

dc.identifier.uri

http://hdl.handle.net/1842/3794

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.subject

question answering

en

dc.subject

natural language engineering

en

dc.title

Topic indexing and retrieval for open domain factoid question answering

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ahn2009.pdf
Size:: 1.21 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection