Opinion summarization of multiple reviews: data synthesis and modeling
dc.contributor.advisor
Lapata, Maria
dc.contributor.advisor
Magdy, Walid
dc.contributor.author
Amplayo, Reinald Kim
dc.date.accessioned
2022-06-28T14:45:38Z
dc.date.available
2022-06-28T14:45:38Z
dc.date.issued
2022-06-28
dc.description.abstract
The proliferation of online reviews has accelerated research on opinion mining, where
the ultimate goal is to glean information from reviews which help users make decisions
more efficiently. While opinion mining has assumed several facets in the literature
(e.g., sentiment analysis, aspect extraction, etc.), opinion summarization, or the task of
automatically creating a textual summary of opinions found in multiple reviews, aims
to help users access content and improves their decision making. This thesis focuses
on different methods to generate opinion summaries given multiple reviews about a
target entity (e.g., a product or service). The task is challenging due to the absence of
large-scale datasets for supervised training, which is paramount to the recent success
of neural-based systems. In this thesis, we propose several methods to synthesize these
datasets, thereby making supervised training for opinion summarization feasible.
Firstly, we introduce a two-step process that creates synthetic datasets for opinion
summarization. Given a corpus of reviews, we first sample a review and pretend it
is a (pseudo-)summary. Then, we procure a list of reviews to pair with the summary.
We obtain these reviews by generating noisy versions of the summary. We propose
a summarization model which learns to denoise the input reviews and generate the
summary, motivated by how humans write opinion summaries by removing divergent
opinions from reviews. Extensive evaluation shows that our model brings substantial
improvements over unsupervised abstractive and extractive baselines.
To further reflect the diversity of opinions in naturally-occurring reviews, we incorporate content planning during synthetic dataset creation. For each pseudo-summary
sampled from the corpus, we automatically induce its content plan in the form of aspect
and sentiment distributions. We then sample reviews from the corpus using Dirichlet
distributions parameterized by the content plan, and controlling the variance accordingly. Experimental results show that our approach outperforms competitive models in
generating opinion summaries that capture opinion consensus.
In opinion summarization, the notion of salience in reviews largely depends on user
interest, therefore a generic summary may not satisfy the needs of all users, limiting
their ability to make decisions. Therefore, we extend opinion summarization to generating aspect-controllable summaries. Using a synthetic training dataset enriched with
aspect controllers of different granularity, we fine-tune a pre-trained language model
which allows the creation of generic and aspect-specific summaries by modifying aspect controllers during inference. Experiments show that our model achieves state of
the art and is able to generate personalized summaries.
en
dc.identifier.uri
https://hdl.handle.net/1842/39230
dc.identifier.uri
http://dx.doi.org/10.7488/era/2481
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Amplayo, R. K., Angelidis, S., and Lapata, M. (2021a). Aspect-controllable opinion summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic. Association for Computational Linguistics
en
dc.relation.hasversion
Amplayo, R. K., Angelidis, S., and Lapata, M. (2021b). Unsupervised opinion summarization with content planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12489–12497
en
dc.relation.hasversion
Amplayo, R. K., Kim, J., Sung, S., and Hwang, S.-w. (2018a). Cold-start aware user and product attention for sentiment classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2535–2544, Melbourne, Australia. Association for Computational Linguistics
en
dc.relation.hasversion
Amplayo, R. K. and Lapata, M. (2020). Unsupervised opinion summarization with noising and denoising. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1934–1945, Online. Association for Computational Linguistics
en
dc.relation.hasversion
Amplayo, R. K. and Lapata, M. (2021). Informative and controllable opinion sum marization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2662–2672, Online. Association for Computational Linguistics.
en
dc.relation.hasversion
Amplayo, R. K., Lim, S., and Hwang, S.-w. (2018b). Entity commonsense representation for neural abstractive summarization. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 697–707, New Orleans, Louisiana. Association for Computational Linguistics.
en
dc.relation.hasversion
Amplayo, R. K. and Song, M. (2017). An adaptable fine-grained sentiment analysis for summarization of multiple short online reviews. Data & Knowledge Engineering, 110:54 – 67
en
dc.relation.hasversion
Angelidis, S., Amplayo, R. K., Suhara, Y., Wang, X., and Lapata, M. (2021). Extractive Opinion Summarization in Quantized Transformer Spaces. Transactions of the Association for Computational Linguistics, 9:277–293.
en
dc.relation.hasversion
Kim, J., Amplayo, R. K., Lee, K., Sung, S., Seo, M., and Hwang, S.-w. (2019). Categorical metadata representation for customized text classification. Transactions of the Association for Computational Linguistics, 7:201–215
en
dc.subject
opinion mining
en
dc.subject
online reviews
en
dc.subject
opinion summarization
en
dc.subject
synthetic datasets
en
dc.subject
content planning
en
dc.subject
Dirichlet distribution
en
dc.subject
aspect controllers
en
dc.title
Opinion summarization of multiple reviews: data synthesis and modeling
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Amplayo2022.pdf
- Size:
- 862.42 KB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

