dc.contributor.advisor | Lemon, Oliver | en |
dc.contributor.advisor | Lapata, Maria | en |
dc.contributor.advisor | Sutton, Charles | en |
dc.contributor.author | Konstas, Ioannis | en |
dc.date.accessioned | 2014-06-03T14:45:26Z | |
dc.date.available | 2014-06-03T14:45:26Z | |
dc.date.issued | 2014-06-27 | |
dc.identifier.uri | http://hdl.handle.net/1842/8926 | |
dc.description.abstract | Much of the data found on the world wide web is in numeric, tabular, or other nontextual
format (e.g., weather forecast tables, stock market charts, live sensor feeds), and
thus inaccessible to non-experts or laypersons. However, most conventional search engines
and natural language processing tools (e.g., summarisers) can only handle textual
input. As a result, data in non-textual form remains largely inaccessible. Concept-to-
text generation refers to the task of automatically producing textual output from
non-linguistic input, and holds promise for rendering non-linguistic data widely accessible.
Several successful generation systems have been produced in the past twenty
years. They mostly rely on human-crafted rules or expert-driven grammars, implement
a pipeline architecture, and usually operate in a single domain.
In this thesis, we present several novel statistical models that take as input a set
of database records and generate a description of them in natural language text. Our
unique idea is to combine the processes of structuring a document (document planning),
deciding what to say (content selection) and choosing the specific words and
syntactic constructs specifying how to say it (lexicalisation and surface realisation),
in a uniform joint manner. Rather than breaking up the generation process into a sequence
of local decisions, we define a probabilistic context-free grammar that globally
describes the inherent structure of the input (a corpus of database records and
text describing some of them). This joint representation allows individual processes
(i.e., document planning, content selection, and surface realisation) to communicate
and influence each other naturally.
We recast generation as the task of finding the best derivation tree for a set of input
database records and our grammar, and describe several algorithms for decoding in this
framework that allows to intersect the grammar with additional information capturing
fluency and syntactic well-formedness constraints. We implement our generators using
the hypergraph framework. Contrary to traditional systems, we learn all the necessary
document, structural and linguistic knowledge from unannotated data. Additionally,
we explore a discriminative reranking approach on the hypergraph representation of
our model, by including more refined content selection features. Central to our approach
is the idea of porting our models to various domains; we experimented on four
widely different domains, namely sportscasting, weather forecast generation, booking
flights, and troubleshooting guides. The performance of our systems is competitive
and often superior compared to state-of-the-art systems that use domain specific constraints,
explicit feature engineering or labelled data. | en |
dc.contributor.sponsor | Engineering and Physical Sciences Research Council (EPSRC) | en |
dc.language.iso | en | |
dc.publisher | The University of Edinburgh | en |
dc.relation.hasversion | Konstas, I. and Lapata, M. (2012). Concept-to-text generation via discriminative reranking. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 369–378, Jeju Island, Korea. | en |
dc.relation.hasversion | Konstas, I. and Lapata, M. (2012). Unsupervised concept-to-text generation with hypergraphs. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 752–761, Montr´eal, Canada. | en |
dc.relation.hasversion | Konstas, I. and Lapata, M. (2013). A global model for concept-to-text generation. Journal of Artificial Intelligence Research, 48:305–346. | en |
dc.relation.hasversion | Konstas, I. and Lapata, M. (2013). Inducing document plans for concept-to-text generation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1503–1514, Seattle, Washington, USA. | en |
dc.subject | natural language generation | en |
dc.subject | natural language processing | en |
dc.title | Joint models for concept-to-text generation | en |
dc.type | Thesis or Dissertation | en |
dc.type.qualificationlevel | Doctoral | en |
dc.type.qualificationname | PhD Doctor of Philosophy | en |