Program analysis of probabilistic programs
Gorinova, Maria I.
Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. The performance of inference methods is sensitive to both the underlying model and the observed data. Different inference techniques are applicable to different classes of models, have different advantages and shortcomings, and require different optimisation and diagnostics techniques to ensure robustness and reliability. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. My thesis is that it is possible to improve probabilistic programming using program analysis, and I present three novel techniques (both static and dynamic), which analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand. Part I of the thesis focuses on static analysis and gives the fi rst formal treatment of the popular probabilistic programming language Stan. While efficient, Stan constrains the space of programs expressible in the language. Programs must be written according to Stan's block syntax, which reduces compositionality. In addition, Stan does not support the explicit use of discrete parameters. Part I introduces the probabilistic programming language SlicStan: a compositional, self-optimising version of Stan, which supports both discrete and continuous parameters. SlicStan uses information ow analysis and type inference to capture conditional independence relationships in the program and transform it for inference in Stan. The result can be seen as a hybrid inference algorithm, where different parameters are inferred according to different inference algorithms for efficiency. Part II shows an example of dynamic analysis. The performance of inference algorithms can be dramatically affected by the parameterisation used to express a model. It is difficult to know in advance what parameterisation is suitable, as it depends on the properties of the observed data. This part demonstrates that reparameterisation can be automated by combining effect handlers in the probabilistic programming language Edward2, with variational inference preprocessing that searches over a space of possible parameterisations.