Advances in scalable learning and sampling of unnormalised models
We study probabilistic models that are known incompletely, up to an intractable normalising constant. To reap the full benefit of such models, two tasks must be solved: learning and sampling. These two tasks have been subject to decades of research, and yet significant challenges still persist. Traditional approaches often suffer from poor scalability with respect to dimensionality and model-complexity, generally rendering them inapplicable to models parameterised by deep neural networks. In this thesis, we contribute a new set of methods for addressing this scalability problem. We first explore the problem of learning unnormalised models. Our investigation begins with a well-known learning principle, Noise-contrastive Estimation, whose underlying mechanism is that of density-ratio estimation. By examining why existing density-ratio estimators scale poorly, we identify a new framework, telescoping density-ratio estimation (TRE), that can learn ratios between highly dissimilar densities in high-dimensional spaces. Our experiments demonstrate that TRE not only yields substantial improvements for the learning of deep unnormalised models, but can do the same for a broader set of tasks including mutual information estimation and representation learning. Subsequently, we explore the problem of sampling unnormalised models. A large literature on Markov chain Monte Carlo (MCMC) can be leveraged here, and in continuous domains, gradient-based samplers such as Metropolis-adjusted Langevin algorithm (MALA) and Hamiltonian Monte Carlo are excellent options. However, there has been substantially less progress in MCMC for discrete domains. To advance this subfield, we introduce several discrete Metropolis-Hastings samplers that are conceptually inspired by MALA, and demonstrate their strong empirical performance across a range of challenging sampling tasks.