Advances in scalable learning and sampling of unnormalised models
View/ Open
Date
10/07/2023Author
Rhodes, Benjamin
Metadata
Abstract
We study probabilistic models that are known incompletely, up to an intractable normalising constant. To reap the full benefit of such models, two
tasks must be solved: learning and sampling. These two tasks have been
subject to decades of research, and yet significant challenges still persist.
Traditional approaches often suffer from poor scalability with respect to
dimensionality and model-complexity, generally rendering them inapplicable to models parameterised by deep neural networks. In this thesis, we
contribute a new set of methods for addressing this scalability problem.
We first explore the problem of learning unnormalised models. Our investigation begins with a well-known learning principle, Noise-contrastive
Estimation, whose underlying mechanism is that of density-ratio estimation.
By examining why existing density-ratio estimators scale poorly, we identify a new framework, telescoping density-ratio estimation (TRE), that can
learn ratios between highly dissimilar densities in high-dimensional spaces.
Our experiments demonstrate that TRE not only yields substantial improvements for the learning of deep unnormalised models, but can do the
same for a broader set of tasks including mutual information estimation and
representation learning.
Subsequently, we explore the problem of sampling unnormalised models.
A large literature on Markov chain Monte Carlo (MCMC) can be leveraged here, and in continuous domains, gradient-based samplers such as
Metropolis-adjusted Langevin algorithm (MALA) and Hamiltonian Monte
Carlo are excellent options. However, there has been substantially less
progress in MCMC for discrete domains. To advance this subfield, we introduce several discrete Metropolis-Hastings samplers that are conceptually
inspired by MALA, and demonstrate their strong empirical performance
across a range of challenging sampling tasks.