Neural document modeling and summarization
Document summarization is the task of automatically generating a shorter version of a document or multiple documents while retaining the most important information. The task has received much attention in the natural language processing community due to its potential for various information access applications. Examples include tools that digest textual content (e.g., news, social media, reviews), answer questions, or provide recommendations. Summarization approaches are dedicated to processing single or multiple documents as well as creating extractive or abstractive summaries. In extractive summarization, summaries are formed by copying and concatenating the most important spans (usually sentences) from the input text, while abstractive approaches are able to generate summaries using words or phrases that are not in the original text. A core module within summarization is how to represent documents and distill information for downstream tasks (e.g., abstraction or extraction). Thanks to the popularity of neural network models and their ability to learn continuous representations, many new systems have been proposed for document modeling and summarization in recent years. This thesis investigates different approaches with neural network models to address the document summarization problem. We develop several novel neural models considering extractive and abstractive approaches for both single-document and multi-document scenarios. We first investigate how to represent a single document with a randomly initialized neural network. Contrary to previous approaches that ignore document structure when encoding the input, we propose a structured attention mechanism, which can impose a structural bias of document-level dependency trees when modeling a document, generating more powerful document representations. We first apply this model to the task of document classification, and subsequently to extractive single-document summarization using an iterative refinement process to learn more complex tree structures. Experimental results on both tasks show that the structured attention mechanism achieves competitive performance. Very recently, pretrained language models have achieved great success on several natural language understanding tasks by training large neural models on an enormous corpus with a language modeling objective. These models learn rich contextual information and to some extent are able to learn the structure of the input text. While summarization systems could in theory also benefit from pretrained language models, there are some potential obstacles to applying these pretrained models to document summarization tasks. The second part of this thesis focuses on how to represent a single document with pretrained language models. Beyond previous approaches that learn solely from the summarization dataset, this thesis proposes a framework for using pretrained language models as encoders for both extractive and abstractive summarization. The framework achieves state-of-the-art results on three datasets. Finally, in the third part of this thesis, we move beyond single documents and explore approaches for using neural networks for summarizing multiple documents. We analyze why the application of existing neural summarization models to this task is challenging and develop a novel modeling framework. More concretely, we propose a ranking-based pipeline and a hierarchical neural encoder for processing multiple input documents. Experiments on a large-scale multi-document summarization dataset, show that our system can achieve promising performance.