Abstractive summarization of long narratives through content selection and model scaling

Saxena, Rohit

Abstractive summarization of long narratives through content selection and model scaling

Simple item page

dc.contributor.advisor

Keller, Frank

en

dc.contributor.advisor

Tang, Hao

en

dc.contributor.advisor

Minervini, asquale

en

dc.contributor.author

Saxena, Rohit

en

dc.date.accessioned

2026-01-23T15:41:52Z

en

dc.date.issued

2025-12-02

en

dc.description.abstract

Abstractive summarization of long narrative texts, such as novels and movie screenplays, presents significant challenges due to their extensive length, complex structure, and the necessity of capturing essential narrative elements accurately. While large language models (LLMs) have demonstrated remarkable progress in text summarization, their ability to process long narratives remains limited due to computational constraints and the difficulty of extracting salient elements. This thesis addresses these challenges, particularly in movie screenplays, by focusing on two key issues: identifying salient scenes crucial to the overall story and handling the computational constraints inherent in processing lengthy texts. First, we introduce MovieSum, a large-scale dataset specifically designed for movie screenplay summarization. MovieSum consists of 2,200 movie screenplays paired with their corresponding Wikipedia plot summaries and is manually formatted to represent structural screenplay elements. This dataset is significantly larger than existing resources and includes metadata such as IMDb IDs, enabling access to additional knowledge. We use MovieSum to benchmark recent LLMs and provide a baseline for future research in narrative summarization. Additionally, we introduce the Movie Scene Saliency Dataset (MENSA), a subset of MovieSum containing human-annotated salient scenes from 100 diverse movies. Second, we investigate the role of scene saliency and saliency-based content selection in screenplay summarization. A movie screenplay consists of numerous scenes, but only a fraction of them contribute meaningfully to the overall story. We propose a two-stage summarization framework utilizing the MENSA dataset. The first stage identifies key scenes based on their relevance to the movie’s narrative, while the second stage generates an abstractive summary using only these salient scenes. Our findings show that this approach outperforms existing state-of-the-art summarization methods, producing summaries that more accurately reflect the content of the narrative. Finally, we address the computational limitations of transformer-based models in processing long documents, including movie screenplays. Existing models rely on truncation, which leads to information loss and inconsistencies between training and inference. To mitigate this, we propose CachED, a gradient caching technique that enables end-to-end training of encoder-decoder models on full-length documents without truncation. We apply CachED to extend BART, creating CachED-BART, which is capable of backpropagation on nearly one million tokens without additional parameter overhead. Experimental results demonstrate that CachED-BART achieves superior performance on long-form summarization tasks, including movie screenplays and books, while maintaining efficiency and scalability. This thesis advances the field of long-form narrative summarization by introducing structured datasets, scene-aware summarization techniques, and novel training methodologies. Our results highlight the importance of selecting salient narrative elements and leveraging efficient model architectures to generate accurate and coherent summaries for complex, lengthy texts such as movie screenplays. Through these contributions, this thesis aims to enhance the efficiency and accuracy of movie script summarization, while also providing valuable insights into overcoming the computational challenges associated with long-form narrative summarization.

en

dc.identifier.uri

https://era.ed.ac.uk/handle/1842/44341

dc.identifier.uri

https://doi.org/10.7488/era/6861

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Rohit Saxena and Frank Keller. 2024. Select and Summarize: Scene Saliency for Movie Script Summarization. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3439–3455, Mexico City, Mexico. Association for Computational Linguistics

en

dc.relation.hasversion

Rohit Saxena and Frank Keller. 2024. MovieSum: An Abstractive Summarization Dataset for Movie Screenplays. In Findings of the Association for Computational Linguistics: ACL 2024, pages 4043–4050, Bangkok, Thailand. Association for Computational Linguistics.

en

dc.relation.hasversion

Rohit Saxena, Hao Tang, and Frank Keller. 2025. End-to-End Long Document Summarization using Gradient Caching. In Transactions of the Association for Computational Linguistics (TACL)

en

dc.relation.hasversion

Rohit Saxena, Pasquale Minervini, and Frank Keller. 2025. PosterSum: A Multimodal Benchmark for Scientific Poster Summarization. NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

en

dc.relation.hasversion

Rohit Saxena, Aryo Pradipta Gema, Pasquale Minervini. 2025. Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs. Reasoning and Planning for LLMs Workshop at The Thirteenth International Conference on Learning Representations

en

dc.relation.hasversion

Saxena, R., Bhat, S., and Pedanekar, N. (2017). Live on TV, alive on Twitter: Quantifying continuous partial attention of viewers during live television telecasts. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 1042–1049.

en

dc.subject

abstractive summarization

en

dc.subject

long narrative summaries

en

dc.subject

MovieSum

en

dc.subject

screenplay dataset

en

dc.subject

human-annotated scenes

en

dc.subject

Gradient Caching for Encoder-Decoder models

en

dc.subject

CachED

en

dc.subject

scalable model training

en

dc.subject

content selection

en

dc.title

Abstractive summarization of long narratives through content selection and model scaling

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Saxena2025.pdf
Size:: 1.76 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection