Social media pecking order: analysing Twitter information streams
Item statusRestricted Access
Embargo end date31/12/2100
This thesis investigates how authority is distributed in social media and develops a novel method to validate the finding using prediction markets. By modelling prediction markets with a data corpus comprising of both microblog and traditional blogs data a objective metric is established to validate a method to select the most authoritative information in Twitter. Also the question of what is the primary unit of authority in twitter is addressed by looking at methods that extract authoritative posts and comparing them to methods that extract the most authoritative users. It is not immediately clear how to measure the success of a system that claims to identify authority. There are a number of ways to decide what constitutes an authoritative post, but many of them are too subjective or abstract. Such as, looking simply at the number of people that repeat a tweet or a subjective polling of users regarding the influence a user has. The approach in this thesis is to use a forecast of a prediction market trained on a language model derived from the posts of a selected group of users or tweets. This method yields a measurable and objective method for determining the informative value of a set of Twitter posts. If a non-random sample of tweets deemed to be authoritative is used, the model should result in a better forecast than the baseline of random tweets. Using this method a number of theories of authority in twitter are examined. The corpus used consisted of Twitter posts that were collected on a daily basis by a crawler from the beginning of April 2009 until January 1st 2010. Four prediction market questions were selected to use when evaluating the the notions of authority. Two general news related questions, one technology news question and two sports questions. when looking at questions posed to the population in general, that are largely opinion based, a small subset of informative users can be selected. This provides support for the idea that authority in twitter is centred around the user rather then the individual tweet. It can be seen that by selecting only a few users, it is possible to get as much information from this small authoritative group as would be required by selecting many more users at random. This demonstrates that for this type of question, there exists methods for finding authoritative users. In this experiment, the top 10 users accounted for 76.9% of the forecast information for the flu related question. Other types of questions though were less conclusive. Question that were more more focused on facts blurred the lines, and it was harder to see if it is users or tweets that carried the most authority.