Show simple item record

dc.contributor.advisorLapata, Maria
dc.contributor.advisorLopez, Adam
dc.contributor.authorPuduppully, Ratish Surendran
dc.date.accessioned2022-04-12T12:55:42Z
dc.date.available2022-04-12T12:55:42Z
dc.date.issued2022-04-11
dc.identifier.urihttps://hdl.handle.net/1842/38869
dc.identifier.urihttp://dx.doi.org/10.7488/era/2123
dc.description.abstractIn this thesis, we consider the task of data-to-text generation, which takes non-linguistic structures as input and produces textual output. The inputs can take the form of database tables, spreadsheets, charts, and so on. The main application of data-to-text generation is to present information in a textual format which makes it accessible to a layperson who may otherwise find it problematic to understand numerical figures. The task can also automate routine document generation jobs, thus improving human efficiency. We focus on generating long-form text, i.e., documents with multiple paragraphs. Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or its variants. These models generate fluent (but often imprecise) text and perform quite poorly at selecting appropriate content and ordering it coherently. This thesis focuses on overcoming these issues by integrating content planning with neural models. We hypothesize data-to-text generation will benefit from explicit planning, which manifests itself in (a) micro planning, (b) latent entity planning, and (c) macro planning. Throughout this thesis, we assume the input to our generator are tables (with records) in the sports domain. And the output are summaries describing what happened in the game (e.g., who won/lost, ..., scored, etc.). We first describe our work on integrating fine-grained or micro plans with data-to-text generation. As part of this, we generate a micro plan highlighting which records should be mentioned and in which order, and then generate the document while taking the micro plan into account. We then show how data-to-text generation can benefit from higher level latent entity planning. Here, we make use of entity-specific representations which are dynam ically updated. The text is generated conditioned on entity representations and the records corresponding to the entities by using hierarchical attention at each time step. We then combine planning with the high level organization of entities, events, and their interactions. Such coarse-grained macro plans are learnt from data and given as input to the generator. Finally, we present work on making macro plans latent while incrementally generating a document paragraph by paragraph. We infer latent plans sequentially with a structured variational model while interleaving the steps of planning and generation. Text is generated by conditioning on previous variational decisions and previously generated text. Overall our results show that planning makes data-to-text generation more interpretable, improves the factuality and coherence of the generated documents and re duces redundancy in the output document.en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionPuduppully, R., Dong, L., and Lapata, M. (2019). Data-to-text generation with content selection and planning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, Hawaii.en
dc.relation.hasversionPuduppully, R., Dong, L., and Lapata, M. (2019). Data-to-text generation with entity modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2023–2035, Florence, Italy. Association for Computational Linguisticsen
dc.relation.hasversionPuduppully, R., Fu, Y., and Lapata, M. (2022). Data-to-text generation with variational sequential planning. Transactions of the Association for Computational Linguistics, abs/2202.13756en
dc.relation.hasversionPuduppully, R. and Lapata, M. (2021). Data-to-text generation with macro planning. Transactions of the Association for Computational Linguistics, abs/2102.02723.en
dc.subjectdata-to-text generationen
dc.subjectlong-form texten
dc.subjectlatent entity planningen
dc.subjectredundancy reductionen
dc.titleData-to-text generation with neural planningen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record