Accurate and reliable probabilistic modeling with high-dimensional data
Machine learning studies algorithms for learning from data. Probabilistic modeling and reasoning define a principled framework for machine learning, where probability theory is used to represent and manipulate knowledge. In this thesis we focus on two fundamental tasks in probabilistic machine learning: probabilistic prediction and density estimation. We study reliability of probabilistic predictive models, propose flexible models for density estimation, and propose a novel training regime for densities with low-dimensional structure. Neural networks demonstrate state-of-the-art performance in many different prediction tasks. At the same time, modern neural networks trained by maximum likelihood have poorly calibrated predictive uncertainties and suffer from adversarial examples. We hypothesize that careful probabilistic treatment of neural networks would make them better calibrated and more robust. However, Bayesian neural networks have to rely on uninformative priors and crude approximations, which makes it difficult to test this hypothesis. In this thesis we take a step back and study adversarial robustness of a simple, linear model, demonstrating that it no longer suffers from calibration errors on adversarial points when the approximate inference method is accurate and the prior is chosen carefully. Classic density estimation methods do not scale to complex, high-dimensional data like natural images. Normalizing flows model the target density as an invertible transformation of a simple base density, and demonstrate good results in high-dimensional density estimation tasks. State-of-the-art normalizing flow architectures rely on parametrizations of univariate invertible functions. Simple additive/affine parametrizations are often used, stacking many layers to express complex transformations. In this thesis we propose novel parametrizations based on cubic and rational-quadratic splines. The proposed flows demonstrate improved parameter-efficiency and advance state-of-the-art on several density estimation benchmarks. The manifold hypothesis says that the data are likely to lie on a lower-dimensional manifold. This assumption is built into many machine learning models, but using it with density models like normalizing flows is difficult: the standard likelihood-based training objective becomes ill-defined. Injective normalizing flows can be implemented, but their training objective is no longer tractable, requiring approximations or heuristic alternatives. In this thesis we propose a novel training objective that uses nested dropout to align the latent space of a normalizing flow, allowing us to extract a sequence of manifold densities from the trained model. Our experiments demonstrate that the manifolds fit by the method match the data well.