Contextual citation recommendation using scientific discourse annotation schemes
dc.contributor.advisor
Klein, Ewan
en
dc.contributor.advisor
Liakata, Maria
en
dc.contributor.advisor
Sutton, Charles
en
dc.contributor.author
Duma, Daniel Cristian
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2019-08-06T10:35:47Z
dc.date.available
2019-08-06T10:35:47Z
dc.date.issued
2019-07-01
dc.description.abstract
All researchers have experienced the problem of fishing out the most relevant scientific
papers from an ocean of publications, and some may have wished that their text editor
suggested these papers automatically. This thesis is vertebrated by this task: recommending
contextually relevant citations to the author of a scientific paper, which we
call Contextual Citation Recommendation (CCR). Like others before, we frame CCR
as an Information Retrieval task and we evaluate our approach using existing publications.
That is, an existing in-text citation to one or more documents in a corpus
is replaced with a placeholder and the task is to retrieve the cited documents automatically.
We carry out a cross-domain study and evaluate our approaches using two
separate document collections in two different domains: computational linguistics and
biomedical science.
This thesis is comprised of three parts, which build cumulatively. Part I establishes
a framework for the task using a standard Information Retrieval setup and explores
different parameters for indexing documents and for extracting the evaluation queries
in order to establish solid baselines for our two corpora in our two domains. We experiment
with symmetric windows of words and sentences for both query extraction
and for integrating the anchor text, that is, the text surrounding a citation, which is an
important source of data for building document representations. We show for the first
time that the contribution of anchor text is very domain dependent.
Part II investigates a number of scientific discourse annotation schemes for academic
articles. It has often been suggested that annotating discourse structure could
support Information Retrieval scenarios such as this one, and this is a key hypothesis
of this thesis. We focus on two of these: Argumentative Zoning (AZ, for the domain
of computational linguistics) and Core Scientific Concepts (for the domain of biomedical
sciences); both of these sentence-based, scientific discourse annotation schemes
which define classes such as Hypothesis, Method and Result for CoreSC and Background/
Own/Contrast for AZ. By annotating each sentence in every document with
AZ/CoreSC and indexing them separately by sentence class, we discover that consistent
citing patterns exist in each domain, such as that sentences of type Conclusion
in cited papers are consistently cited by other sentences of type Conclusion or Background
in citing biomedical articles.
Finally, Part III moves away from simple windows over terms or over sentences for
extracting the query from a citation’s context, and investigates methods for supervised
query extraction using linguistic information. As part of this, we first explore how
to automatically generate training data in the form of citation contexts paired with an
optimal query to generate. Second, we train supervised machine learning models for
automatically extracting these queries with limited prior knowledge of the document
collection and show important improvements over our baselines in the domain of computational
linguistics. We also investigate the contribution of stopwords to each corpus
and we explore the performance of human annotators at this task.
en
dc.identifier.uri
http://hdl.handle.net/1842/35968
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
DUMA, D. AND KLEIN, E., 2014. Citation resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.
en
dc.relation.hasversion
DUMA, D.; LIAKATA, M.; CLARE, A.; RAVENSCROFT, J.; AND KLEIN, E., 2016a. Applying Core Scientific Concepts to context-based citation recommendation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC-2016), 23-28 May 2016, Portoroz (Slovenia).
en
dc.relation.hasversion
DUMA, D.; LIAKATA, M.; CLARE, A.; RAVENSCROFT, J.; AND KLEIN, E., 2016b. Rhetorical classification of anchor text for citation recommendation. D-Lib Magazine, 22, 9/10 (2016).
en
dc.relation.hasversion
DUMA, D.; SUTTON, C.; AND KLEIN, E., 2016c. Context matters: Towards extracting a citation’s context using linguistic features. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 201–202. ACM.
en
dc.subject
scientific publishing
en
dc.subject
Contextual Citation Recommendation
en
dc.subject
Information Retrieval
en
dc.subject
search engines
en
dc.subject
indexing
en
dc.subject
machine learning
en
dc.subject
query extraction
en
dc.title
Contextual citation recommendation using scientific discourse annotation schemes
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Duma2019.pdf
- Size:
- 15.01 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

