Program analysis of probabilistic programs
View/ Open
Date
25/05/2022Author
Gorinova, Maria I.
Metadata
Abstract
Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. The performance of inference methods is sensitive to both the underlying model and the observed data. Different inference techniques are applicable to different classes of models, have different advantages and shortcomings, and require different optimisation and diagnostics techniques to ensure robustness and reliability.
No single inference algorithm can be used as a probabilistic programming back-end that
is simultaneously reliable, efficient, black-box, and general. Probabilistic programming
languages often choose a single algorithm to apply to a given problem, thus inheriting
its limitations. While substantial work has been done both to formalise probabilistic
programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. My thesis is that it is possible to improve probabilistic
programming using program analysis, and I present three novel techniques (both static
and dynamic), which analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.
Part I of the thesis focuses on static analysis and gives the fi rst formal treatment of the
popular probabilistic programming language Stan. While efficient, Stan constrains the
space of programs expressible in the language. Programs must be written according to
Stan's block syntax, which reduces compositionality. In addition, Stan does not support
the explicit use of discrete parameters. Part I introduces the probabilistic programming
language SlicStan: a compositional, self-optimising version of Stan, which supports both
discrete and continuous parameters. SlicStan uses information
ow analysis and type inference to capture conditional independence relationships in the program and transform it for inference in Stan. The result can be seen as a hybrid inference algorithm, where different parameters are inferred according to different inference algorithms for efficiency.
Part II shows an example of dynamic analysis. The performance of inference algorithms
can be dramatically affected by the parameterisation used to express a model. It is difficult to know in advance what parameterisation is suitable, as it depends on the properties of the observed data. This part demonstrates that reparameterisation can be automated by combining effect handlers in the probabilistic programming language Edward2, with
variational inference preprocessing that searches over a space of possible parameterisations.