Syntax-mediated semantic parsing
Reddy Goli, Venkata Sivakumar
Querying a database to retrieve an answer, telling a robot to perform an action, or teaching a computer to play a game are tasks requiring communication with machines in a language interpretable by them. Semantic parsing is the task of converting human language to a machine interpretable language. While human languages are sequential in nature with latent structures, machine interpretable languages are formal with explicit structures. The computational linguistics community have created several treebanks to understand the formal syntactic structures of human languages. In this thesis, we use these to obtain formal meaning representations of languages, and learn computational models to convert these meaning representations to the target machine representation. Our goal is to evaluate if existing treebank syntactic representations are useful for semantic parsing. Existing semantic parsing methods mainly learn domain-specific grammars which can parse human languages to machine representation directly. We deviate from this trend and make use of general-purpose syntactic grammar to help in semantic parsing. We use two syntactic representations: Combinatory Categorial Grammar (CCG) and dependency syntax. CCG has a well established theory on deriving meaning representations from its syntactic derivations. But there are no CCG treebanks for many languages since these are difficult to annotate. In contrast, dependencies are easy to annotate and have many treebanks. However, dependencies do not have a well established theory for deriving meaning representations. In this thesis, we propose novel theories for deriving meaning representations from dependencies. Our evaluation task is question answering on a knowledge base. Given a question, our goal is to answer it on the knowledge base by converting the question to an executable query. We use Freebase, the knowledge source behind Google’s search engine, as our knowledge base. Freebase contains millions of real world facts represented in a graphical format. Inspired from the Freebase structure, we formulate semantic parsing as a graph matching problem, i.e., given a natural language sentence, we convert it into a graph structure from the meaning representation obtained from syntax, and find the subgraph of Freebase that best matches the natural language graph. Our experiments on Free917, WebQuestions and GraphQuestions semantic parsing datasets conclude that general-purpose syntax is more useful for semantic parsing than induced task-specific syntax and syntax-agnostic representations.