Edinburgh Research Archive

Scaling real-time event detection to massive streams

dc.contributor.advisor
Thompson, Henry
en
dc.contributor.advisor
Lavrenko, Victor
en
dc.contributor.author
Wurzer, Dominik Stefan
en
dc.date.accessioned
2018-03-27T12:42:45Z
dc.date.available
2018-03-27T12:42:45Z
dc.date.issued
2017-11-30
dc.description.abstract
In today’s world the internet and social media are omnipresent and information is accessible to everyone. This shifted the advantage from those who have access to information to those who do so first. Identifying new events as they emerge is of substantial value to financial institutions who consider realtime information in their decision making processes, as well as for journalists that report about breaking news and governmental agencies that collect information and respond to emergencies. First Story Detection is the task of identifying those documents in a stream of documents that talk about new events first. This seemingly simple task is non-trivial as the computational effort increases with every processed document. Standard approaches to solve First Story Detection determine a document’s novelty by comparing it to previously seen documents. This results in the highest reported accuracy but even the currently fastest system only scales to 10% of the Twitter stream. In this thesis, we propose a new algorithm family, called memory-based methods, able to scale to the full Twitter stream on a single core. Our memory-based method computes a document’s novelty up to two orders of magnitude faster than state-of-the-art systems without sacrificing accuracy. This thesis additional provides original work on the impact of processing unbounded data streams on detection accuracy. Our experiments reveal for the first time that the novelty scores of state-of-the-art comparison based and memory-based methods decay over time. We show how to counteract the discovered novelty decay and increase detection accuracy. Additionally, we show that memory-based methods are applicable beyond First Story Detection by building the first real time rumour detection system on social media streams.
en
dc.identifier.uri
http://hdl.handle.net/1842/29013
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Qin Y.,Wurzer D., Lavrenko V. and Tang C. (2017). “Counteracting Novelty Decay in First Story Detection.” In ECIR - European Conference on Information Retrieval, pp. 555-560. wode airen Springer.
en
dc.relation.hasversion
Wurzer D., Lavrenko V., Osborne M. (2015). Tracking unbounded Topic Streams. In the Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, ACL.
en
dc.relation.hasversion
Wurzer D., Lavrenko V., Osborne M. (2015). Twitter-scale New Event Detection via K-term Hashing. In the Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP.
en
dc.subject
realtime information
en
dc.subject
First Story Detection
en
dc.subject
memory-based methods
en
dc.subject
algorithms
en
dc.subject
detection accuracy
en
dc.subject
rumour detection
en
dc.subject
social media
en
dc.title
Scaling real-time event detection to massive streams
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Wurzer2017.pdf
Size:
3.65 MB
Format:
Adobe Portable Document Format

This item appears in the following Collection(s)