Weakly supervised sentiment analysis and opinion extraction
In recent years, online reviews have become the foremost medium for users to express their satisfaction, or lack thereof, about products and services. The proliferation of user-generated reviews, combined with the rapid growth of e-commerce, results in vast amounts of opinionated text becoming available to consumers, manufacturers, and researchers alike. This has fuelled an increased focus on automated methods that attempt to discover, analyze, and distill opinions found in text. This thesis tackles the tasks of fine-grained sentiment analysis and aspect extraction, and presents a unified framework for the summarization of opinions from multiple user reviews. Two core concepts form the basis of our methodology. Firstly, the use of neural networks, whose ability to learn continuous feature representations from data, without recourse to preprocessing tools or linguistic annotations, has advanced the state-of-the-art of numerous Natural Language Processing tasks. Secondly, our belief that opinion mining systems applied to real-life applications cannot rely on expensive human annotations and should mostly take advantage of freely available review data. Specifically, the main contributions of this thesis are: (i) The creation of OPOSUM, a new Opinion Summarization corpus which contains over one million reviews from multiple domains. To test our methods, we annotated a subset of the data with fine-grained sentiment and aspect labels, as well as extractive gold-standard opinion summaries. (ii) The development of two weakly-supervised hierarchical neural models for the detection and extraction of sentiment-heavy expressions in reviews. Our first model composes segment representations hierarchically and uses an attention mechanism to differentiate between opinions and neutral statements. Our second model is based on Multiple Instance Learning (MIL), and can detect user opinions of potentially opposing polarity. Experiments demonstrate significant benefits from our MIL-based architecture. (iii) The introduction of a neural model for aspect extraction, which requires minimal human involvement. Our proposed formulation uses aspect keywords to help the model target specific aspects, and a multi-tasking objective to further improve its accuracy. (iv) A unified summarization framework which combines our sentiment and aspect detection methods, while taking redundancy into account to produce useful opinion summaries from multiple reviews. Automatic evaluation, on our opinion summarization dataset, shows significant improvements over other summarization systems in terms of extraction accuracy and similarity to reference summaries. A large-scale judgement elicitation study indicates that our summaries are also preferred by human judges.