Edinburgh Research Archive

Contextual citation recommendation using scientific discourse annotation schemes

dc.contributor.advisor
Klein, Ewan
en
dc.contributor.advisor
Liakata, Maria
en
dc.contributor.advisor
Sutton, Charles
en
dc.contributor.author
Duma, Daniel Cristian
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2019-08-06T10:35:47Z
dc.date.available
2019-08-06T10:35:47Z
dc.date.issued
2019-07-01
dc.description.abstract
All researchers have experienced the problem of fishing out the most relevant scientific papers from an ocean of publications, and some may have wished that their text editor suggested these papers automatically. This thesis is vertebrated by this task: recommending contextually relevant citations to the author of a scientific paper, which we call Contextual Citation Recommendation (CCR). Like others before, we frame CCR as an Information Retrieval task and we evaluate our approach using existing publications. That is, an existing in-text citation to one or more documents in a corpus is replaced with a placeholder and the task is to retrieve the cited documents automatically. We carry out a cross-domain study and evaluate our approaches using two separate document collections in two different domains: computational linguistics and biomedical science. This thesis is comprised of three parts, which build cumulatively. Part I establishes a framework for the task using a standard Information Retrieval setup and explores different parameters for indexing documents and for extracting the evaluation queries in order to establish solid baselines for our two corpora in our two domains. We experiment with symmetric windows of words and sentences for both query extraction and for integrating the anchor text, that is, the text surrounding a citation, which is an important source of data for building document representations. We show for the first time that the contribution of anchor text is very domain dependent. Part II investigates a number of scientific discourse annotation schemes for academic articles. It has often been suggested that annotating discourse structure could support Information Retrieval scenarios such as this one, and this is a key hypothesis of this thesis. We focus on two of these: Argumentative Zoning (AZ, for the domain of computational linguistics) and Core Scientific Concepts (for the domain of biomedical sciences); both of these sentence-based, scientific discourse annotation schemes which define classes such as Hypothesis, Method and Result for CoreSC and Background/ Own/Contrast for AZ. By annotating each sentence in every document with AZ/CoreSC and indexing them separately by sentence class, we discover that consistent citing patterns exist in each domain, such as that sentences of type Conclusion in cited papers are consistently cited by other sentences of type Conclusion or Background in citing biomedical articles. Finally, Part III moves away from simple windows over terms or over sentences for extracting the query from a citation’s context, and investigates methods for supervised query extraction using linguistic information. As part of this, we first explore how to automatically generate training data in the form of citation contexts paired with an optimal query to generate. Second, we train supervised machine learning models for automatically extracting these queries with limited prior knowledge of the document collection and show important improvements over our baselines in the domain of computational linguistics. We also investigate the contribution of stopwords to each corpus and we explore the performance of human annotators at this task.
en
dc.identifier.uri
http://hdl.handle.net/1842/35968
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
DUMA, D. AND KLEIN, E., 2014. Citation resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.
en
dc.relation.hasversion
DUMA, D.; LIAKATA, M.; CLARE, A.; RAVENSCROFT, J.; AND KLEIN, E., 2016a. Applying Core Scientific Concepts to context-based citation recommendation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC-2016), 23-28 May 2016, Portoroz (Slovenia).
en
dc.relation.hasversion
DUMA, D.; LIAKATA, M.; CLARE, A.; RAVENSCROFT, J.; AND KLEIN, E., 2016b. Rhetorical classification of anchor text for citation recommendation. D-Lib Magazine, 22, 9/10 (2016).
en
dc.relation.hasversion
DUMA, D.; SUTTON, C.; AND KLEIN, E., 2016c. Context matters: Towards extracting a citation’s context using linguistic features. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 201–202. ACM.
en
dc.subject
scientific publishing
en
dc.subject
Contextual Citation Recommendation
en
dc.subject
Information Retrieval
en
dc.subject
search engines
en
dc.subject
indexing
en
dc.subject
machine learning
en
dc.subject
query extraction
en
dc.title
Contextual citation recommendation using scientific discourse annotation schemes
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Duma2019.pdf
Size:
15.01 MB
Format:
Adobe Portable Document Format

This item appears in the following Collection(s)