Foundation for synthesising programming language semantics
Item Status
Embargo End Date
Date
Authors
Bartha, Sándor
Abstract
Programming or scripting languages used in real-world systems are seldom designed
with a formal semantics in mind from the outset. Therefore, the first step for developing well-founded analysis tools for these systems is to reverse-engineer a formal
semantics. This can take months or years of effort.
Could we automate this process, at least partially? Though desirable, automatically reverse-engineering semantics rules from an implementation is very challenging,
as found by Krishnamurthi, Lerner and Elberty. They propose automatically learning
desugaring translation rules, mapping the language whose semantics we seek to a simplified, core version, whose semantics are much easier to write. The present thesis
contains an analysis of their challenge, as well as the first steps towards a solution.
Scaling methods with the size of the language is very difficult due to state space
explosion, so this thesis proposes an incremental approach to learning the translation
rules. I present a formalisation that both clarifies the informal description of the challenge by Krishnamurthi et al, and re-formulates the problem, shifting the focus to the
conditions for incremental learning. The central definition of the new formalisation is
the desugaring extension problem, i.e. extending a set of established translation rules
by synthesising new ones.
In a synthesis algorithm, the choice of search space is important and non-trivial,
as it needs to strike a good balance between expressiveness and efficiency. The rest
of the thesis focuses on defining search spaces for translation rules via typing rules.
Two prerequisites are required for comparing search spaces. The first is a series of
benchmarks, a set of source and target languages equipped with intended translation
rules between them. The second is an enumerative synthesis algorithm for efficiently
enumerating typed programs. I show how algebraic enumeration techniques can be applied to enumerating well-typed translation rules, and discuss the properties expected
from a type system for ensuring that typed programs be efficiently enumerable.
The thesis presents and empirically evaluates two search spaces. A baseline search
space yields the first practical solution to the challenge. The second search space is
based on a natural heuristic for translation rules, limiting the usage of variables so that
they are used exactly once. I present a linear type system designed to efficiently enumerate translation rules, where this heuristic is enforced. Through informal analysis
and empirical comparison to the baseline, I then show that using linear types can speed
up the synthesis of translation rules by an order of magnitude.
This item appears in the following Collection(s)

