Edinburgh Research Archive

Explicit context representations for conversational language understanding

Item Status

Embargo End Date

Authors

Jain, Parag

Abstract

Natural language interfaces enable users to interact with automated systems by using human languages, such as English. Users often pose complex or incomplete questions as they explore and refine their needs through conversational queries, such as, \textit{Where did Christopher Bishop earn his PhD? -- Edinburgh -- Where is it located?}. Effectively responding to such queries requires the system to understand conversational language. This entails resolving references, handling topic-extensions and topic-shift, and managing long-term dependencies, all situated within the specific context of the ongoing exchange. In this thesis, we are interested in building systems capable of conversational language understanding, which is fundamental to multiple tasks such as question answering, task completion (e.g., hotel booking, calendar management), and semantic parsing. Such a system necessitates the integration of both \textit{discourse} and \textit{external} contexts. The discourse context includes earlier interactions, such as previous queries and responses. The external context is task-dependent but is also critical; for instance, a ticket booking system needs to access the latest flight schedules, while a question-answering system might require access to knowledge from external sources like Wikipedia or Wikidata. As the length of interaction grows, maintenance and utilization of relevant context information become crucial to effectively addressing user queries. This thesis argues that \textbf{maintaining explicit representations of context -- external or discourse -- is helpful for tasks requiring long-range interactions}. We show that dynamic representations are both computationally efficient in representing context and effective across various conversational tasks. First, we look at the problem of modeling discourse context for long-range interactions and propose to represent discourse information using a bounded external memory. Our memory is end-to-end learned and interpretable. On the task of conversational semantic parsing over relational databases we show that our method is able to maintain performance over long-range interactions and can handle several discourse related phenomena such as topic-shift and handling referring expressions. Second, we look at the problem of modeling external context in the form of a knowledge graph. We create $\mathbb{SPICE}$, a semantic parsing dataset for conversational question answering over Wikidata knowledge graph (KG). Wikidata contains millions of entities and thousands of relations making it infeasible to fully encode the KG. To tackle this, we propose dynamic context graphs that represent information about user utterance and its context through a dynamically generated subgraph, wherein the number of nodes varies for each utterance. We further show that the graph-structure itself is crucial, and merely linearizing the subgraph as a textual string is neither efficient nor performant. These findings establish that structured representations of context are beneficial. Finally, we validate our hypothesis on tasks other than semantic parsing with large pretrained language models. We focus on conversational question answering over heterogeneous sources. We employ a graph-structured representation to unify information from different sources, such as, text, knowledge graphs, tables, and infoboxes. Our approach maintains a conversational memory to track and update past evidence, thus influencing the graph's structure and representation, as the conversation evolves. We show that the graph enhances the language model's ability to reason over multiple modalities, while the memory module provides robustness against noise and retrieval errors.

This item appears in the following Collection(s)