Abstractive summarization of long narratives through content selection and model scaling
Files
Item Status
Embargo End Date
Date
Authors
Abstract
Abstractive summarization of long narrative texts, such as novels and movie screenplays, presents significant challenges due to their extensive length, complex structure, and the necessity of capturing essential narrative elements accurately. While large language models (LLMs) have demonstrated remarkable progress in text summarization, their ability to process long narratives remains limited due to computational constraints and the difficulty of extracting salient elements. This thesis addresses these challenges, particularly in movie screenplays, by focusing on two key issues: identifying salient scenes crucial to the overall story and handling the computational constraints inherent in processing lengthy texts.
First, we introduce MovieSum, a large-scale dataset specifically designed for movie screenplay summarization. MovieSum consists of 2,200 movie screenplays paired with their corresponding Wikipedia plot summaries and is manually formatted to represent structural screenplay elements. This dataset is significantly larger than existing resources and includes metadata such as IMDb IDs, enabling access to additional knowledge. We use MovieSum to benchmark recent LLMs and provide a baseline for future research in narrative summarization. Additionally, we introduce the Movie Scene Saliency Dataset (MENSA), a subset of MovieSum containing human-annotated salient scenes from 100 diverse movies.
Second, we investigate the role of scene saliency and saliency-based content selection in screenplay summarization. A movie screenplay consists of numerous scenes, but only a fraction of them contribute meaningfully to the overall story. We propose a two-stage summarization framework utilizing the MENSA dataset. The first stage identifies key scenes based on their relevance to the movie’s narrative, while the second stage generates an abstractive summary using only these salient scenes. Our findings show that this approach outperforms existing state-of-the-art summarization methods, producing summaries that more accurately reflect the content of the narrative.
Finally, we address the computational limitations of transformer-based models in processing long documents, including movie screenplays. Existing models rely on truncation, which leads to information loss and inconsistencies between training and inference. To mitigate this, we propose CachED, a gradient caching technique that enables end-to-end training of encoder-decoder models on full-length documents without truncation. We apply CachED to extend BART, creating CachED-BART, which is capable of backpropagation on nearly one million tokens without additional parameter overhead. Experimental results demonstrate that CachED-BART achieves superior performance on long-form summarization tasks, including movie screenplays and books, while maintaining efficiency and scalability.
This thesis advances the field of long-form narrative summarization by introducing structured datasets, scene-aware summarization techniques, and novel training methodologies. Our results highlight the importance of selecting salient narrative elements and leveraging efficient model architectures to generate accurate and coherent summaries for complex, lengthy texts such as movie screenplays. Through these contributions, this thesis aims to enhance the efficiency and accuracy of movie script summarization, while also providing valuable insights into overcoming the computational challenges associated with long-form narrative summarization.
This item appears in the following Collection(s)

