Natural Language Generation for the Semantic Web: Unsupervised template extraction
Item statusRestricted Access
I propose an architecture for a Natural Language Generation system that automatically learns sentence templates, together with statistical document planning, from parallel RDF data and text. To this end, I design, build and test a proof-of-concept system (“LOD-DEF”) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, with the communicative goal of generating short descriptions of entities in an RDF ontology. Inspired by previous work, I implement a baseline triple-to-text generation system and I conduct human evaluation the LOD-DEF system against the baseline and human-generated output. LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.