Edinburgh Research Archive

Data-to-text generation with neural planning

dc.contributor.advisor
Lapata, Maria
dc.contributor.advisor
Lopez, Adam
dc.contributor.author
Puduppully, Ratish Surendran
dc.date.accessioned
2022-04-12T12:55:42Z
dc.date.available
2022-04-12T12:55:42Z
dc.date.issued
2022-04-11
dc.description.abstract
In this thesis, we consider the task of data-to-text generation, which takes non-linguistic structures as input and produces textual output. The inputs can take the form of database tables, spreadsheets, charts, and so on. The main application of data-to-text generation is to present information in a textual format which makes it accessible to a layperson who may otherwise find it problematic to understand numerical figures. The task can also automate routine document generation jobs, thus improving human efficiency. We focus on generating long-form text, i.e., documents with multiple paragraphs. Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or its variants. These models generate fluent (but often imprecise) text and perform quite poorly at selecting appropriate content and ordering it coherently. This thesis focuses on overcoming these issues by integrating content planning with neural models. We hypothesize data-to-text generation will benefit from explicit planning, which manifests itself in (a) micro planning, (b) latent entity planning, and (c) macro planning. Throughout this thesis, we assume the input to our generator are tables (with records) in the sports domain. And the output are summaries describing what happened in the game (e.g., who won/lost, ..., scored, etc.). We first describe our work on integrating fine-grained or micro plans with data-to-text generation. As part of this, we generate a micro plan highlighting which records should be mentioned and in which order, and then generate the document while taking the micro plan into account. We then show how data-to-text generation can benefit from higher level latent entity planning. Here, we make use of entity-specific representations which are dynam ically updated. The text is generated conditioned on entity representations and the records corresponding to the entities by using hierarchical attention at each time step. We then combine planning with the high level organization of entities, events, and their interactions. Such coarse-grained macro plans are learnt from data and given as input to the generator. Finally, we present work on making macro plans latent while incrementally generating a document paragraph by paragraph. We infer latent plans sequentially with a structured variational model while interleaving the steps of planning and generation. Text is generated by conditioning on previous variational decisions and previously generated text. Overall our results show that planning makes data-to-text generation more interpretable, improves the factuality and coherence of the generated documents and re duces redundancy in the output document.
en
dc.identifier.uri
https://hdl.handle.net/1842/38869
dc.identifier.uri
http://dx.doi.org/10.7488/era/2123
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Puduppully, R., Dong, L., and Lapata, M. (2019). Data-to-text generation with content selection and planning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, Hawaii.
en
dc.relation.hasversion
Puduppully, R., Dong, L., and Lapata, M. (2019). Data-to-text generation with entity modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2023–2035, Florence, Italy. Association for Computational Linguistics
en
dc.relation.hasversion
Puduppully, R., Fu, Y., and Lapata, M. (2022). Data-to-text generation with variational sequential planning. Transactions of the Association for Computational Linguistics, abs/2202.13756
en
dc.relation.hasversion
Puduppully, R. and Lapata, M. (2021). Data-to-text generation with macro planning. Transactions of the Association for Computational Linguistics, abs/2102.02723.
en
dc.subject
data-to-text generation
en
dc.subject
long-form text
en
dc.subject
latent entity planning
en
dc.subject
redundancy reduction
en
dc.title
Data-to-text generation with neural planning
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Puduppully2022.pdf
Size:
2.37 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)