Robust and efficient inference and learning algorithms for generative models
Generative modelling is a popular paradigm in machine learning due to its natural ability to describe uncertainty in data and models and for its applications including data compression (Ho et al., 2020), missing data imputation (Valera et al., 2018), synthetic data generation (Lin et al., 2020), representation learning (Kingma and Welling, 2014), robust classification (Li et al., 2019b), and more. For generative models, the task of finding the distribution of unobserved variables conditioned on observed ones is referred to as inference. Finding the optimal model that makes the model distribution close to the data distribution according to some discrepancy measures is called learning. In practice, existing learning and inference methods can fall short on robustness and efficiency. A method that is more robust to its hyper-parameters or different types of data can be more easily adapted to various real-world applications. How efficient a method is in regard to the size and the dimensionality of data determines at what scale the method can be applied. This thesis presents four pieces of my original work that improves these properties in generative models. First, I introduce two novel Bayesian inference algorithms. One is called coupled multinomial Hamiltonian Monte Carlo (Xu et al., 2021a); it builds on Heng and Jacob (2019), which is a recent work in unbiased Markov chain Monte Carlo (MCMC) (Jacob et al., 2019b) and has been found to sensitive to hyper-parameters and less efficient compared to normal, biased MCMC. These issues are solved by establishing couplings to the widely-used multinomial Hamiltonian Monte Carlo, leading to a statistically more efficient and robust method. The other method is called roulette-based variational expectation (RAVE; Xu et al., 2019) that applies amortised inference to a model family called Bayesian non-parametric models, in which the number of parameters are allowed to grow unbounded as the data gets more complex. Unlike previous sampling-based methods that are slow or variational inference methods that rely on truncation, RAVE combines the advantages of both to achieve flexible inference that is also computational efficient. Second, I introduce two novel learning methods. One is called generative ratio-matching (Srivastava et al., 2019) which is a learning algorithm that makes deep generative models based on kernel methods applicable to high-dimensional data. The key innovation of this method is learning a projection of the data to a lower-dimensional space in which the density ratio is preserved such that learning can be done in the lowerdimensional space where kernel methods are effective. The other method is called Bayesian symbolic physics that combines Bayesian inference and symbolic regression in the context of naïve physics—the study of how humans understand and learn physics. Unlike classic generative models for which the structure of the generative process is predefined or deep generative models where the process is represented by data-hungry neural networks, Bayesian-symbolic generative processes are defined by functions over a hypothesis space specified by a context-free grammar. This formulation allows these models to incorporate domain knowledge in learning, which gives highly-improved sample efficiency. For all four pieces of work, I provide theoretical analyses and/or empirical results to validate that the algorithmic advances lead to improvements in robustness and efficiency for generative models. Lastly, I summarise my contributions to free and open-source software on generative modelling. This includes a set of Julia packages that I contributed and are currently used by the Turing probabilistic programming language (Ge et al., 2018). These packages, which are highly reusable components for building probabilistic programming languages, together form a probabilistic programming ecosystem in Julia. An important package that is primarily developed by me is called ADVANCEDHMC.JL (Xu et al., 2020), which provides robust and efficient implementations of HMC methods and has been adopted as the backend of Turing. Importantly, the design of this package allows an intuitive abstraction to construct HMC samplers similarly to how they are mathematically defined. The promise of these open-source packages is to make generative modelling techniques more accessible to domain experts from various backgrounds and to make relevant research more reproducible to help advance the field.