Edinburgh Research Archive

Revealing the value of social media data in forecasting tourism demand: evidence from Twitter

Item Status

Embargo End Date

Authors

Qiu, Yuanming

Abstract

As the digital era advances, tourists are increasingly navigating a wide range of internet-based resources to make travel decisions. Among these, social media platforms have emerged as a beacon, guiding tourists’ decisions about where to visit with an enormous amount of publicly accessible travel-related information. This significant shift has catalysed a surge in research aimed at leveraging the potency of internet-based data (e.g., search engine query data and web traffic data) as predictors for generating precise tourism demand forecasts. Such enhanced forecasts emerge as vital components in implementing efficient crowd control and capacity management strategies within the tourism industry. While this emergent field is witnessing significant expansion, researchers have primarily utilized search engine data in their forecasts, due to their structured, time-series nature. These structured data elucidate the evolving patterns of tourists’ attention and interest, effectively signalling volumes of tourist arrivals. Conversely, the potential of unstructured social media data for forecasting tourism demand remains a largely untapped area of tourism research, still nascent and significantly sparse in its contributions. Although some initial research signals the potential of social media data in tourism demand forecasting, the arena of demand forecasting at the individual attraction level needs to be investigated more. Therefore, an overarching aim of this thesis is to reveal the value of social media data, especially Twitter data, in forecasting tourism demand at the attraction level and, by doing so, fill this void in the tourism literature. As the leading micro-blogging platform worldwide, Twitter presents a fertile landscape for researchers across various disciplines seeking to capture intricate informative signals. In tourism literature, however, the value of Twitter data is particularly underexplored within the context of forecasting tourism demand for individual tourist attractions, which are at the heart of the tourism experience and easily constrained by physical capacity. Addressing the above research aim, this thesis aims to reveal the value of Twitter data for tourism demand forecasts for attractions. Specifically, this research aims to address the following objectives: (1)Assess the role of Twitter data in tourism demand forecasting at the attraction level. (2)Examine the advantages of deciphering embedded communication flows on Twitter for generating improved tourism demand forecasts at the attraction level. (3)Evaluate the value of commonly excluded Twitter “noise” (i.e., - tweets that neither stimulate visits nor signal tourists’ intentions to visit) for supplementing tourism demand forecasts at the attraction level. Grounded in signalling theory, this thesis embodies three empirical studies, utilizing the British Museum as a case study. Initially, Chapter 4 provides an overall evaluation of the value of unprocessed Twitter data for signalling tourism demand at the attraction level and encourages efforts to generate refined forecasts of tourist arrivals. Following this, Chapter 5 clusters collected tweets into three categories based on the embedded communication flows, as identified through text analysis. Subsequent to the classification, several forecasting models are applied to scrutinize the signalling value of variables derived from each communication flow. The findings reveal the importance of mapping Twitter communication flows to decipher the information dissemination process within the Twittersphere. Specifically, the results demonstrate that direct communications between the attraction of interest (in this case, the British Museum) and tourists offer more valuable signals for tourism demand prediction than other communication flows. Before proceeding to the final study, it is essential to note that tweets that failed to stimulate visiting intentions (i.e., negative signals: tweets criticizing the British Museum’s possession of contested artefacts and corporate sponsorship) were systematically removed from the dataset prior to variable construction in the above-noted studies. In fact, the insights from Chapter 5 underscore the vital necessity for data pre-processing to counteract the inherent “noise” in Twitter data. Chapter 6, drawing its focus back to those tweets previously categorized as noise, examines the signalling value of these tweets regarding the volume of tourist arrivals. To precisely extract those tweets roughly detected in Chapter 5, Chapter 6 first proposes a three-staged framework that adopts sentiment analysis, judgemental screening, topic modelling, and in-context keyword searching. It is revealed that the noise consists of boycott tweets opposing the British Museum’s contested artefacts and its sponsorship by fossil fuel companies. Then, variables generated from these boycott tweets are tested for their dynamic relationship with the volume of tourist arrivals. The findings of Chapter 6 underscore the short-to-medium-term negative impact of likes on tweets that oppose fossil fuel sponsorship, on seasonally adjusted tourist arrivals to the British Museum. The thesis contributes to the corpus of literature on tourism demand forecasting in three distinct areas. First, it substantiates the importance of Twitter data in forecasting tourism demand at the attraction level, by providing both a solid theoretical underpinning and compelling empirical evidence. This strengthens the emerging recognition of social media data (i.e., Twitter data) as a valuable resource in tourism research, particularly tourism demand forecasts for attractions. Second, it formulates and appraises a communication flow mapping framework within the Twittersphere, assessing its efficacy in improving demand forecasts. Focusing on specific communication flows, such as direct interactions between Twitter users and attractions, demonstrates the importance of these flows for predicting tourism demand. Third, it explores the value of the noise within the Twittersphere, which is often overlooked in existing literature, for deciphering fluctuations in tourism demand. This analysis deepens our understanding of the complexities of unstructured social media data and highlights the potential usefulness of the noise for tourism demand forecasts. In closing, the thesis discusses these findings’ theoretical and practical implications. It also acknowledges the inherent limitations of the current research. It postulates potential research trajectories, thereby setting the stage for continued scholarly exploration in the field of tourism research utilizing Twitter data.

This item appears in the following Collection(s)