Stochastic dynamics and partitioned algorithms for model parameterization in deep learning
dc.contributor.advisor
Leimkuhler, Benedict
dc.contributor.advisor
Malham, Simon
dc.contributor.advisor
Paulin, Daniel
dc.contributor.author
Vlaar, Tiffany Joyce
dc.date.accessioned
2022-06-16T14:09:22Z
dc.date.available
2022-06-16T14:09:22Z
dc.date.issued
2022-06-16
dc.description.abstract
In this thesis, we study model parameterization for deep learning applications. Part of the mathematical foundation for this work lies in stochastic differential equations and their constrained counterparts. We will study their role in deep learning, their properties, and their discretization. On the deep learning theory side we discuss questions around generalization error, optimization, the structure of neural network loss landscapes, and existing metrics of neural network training. Rather than aiming to exceed state-of-the-art results on benchmark datasets, our work in this area is aimed at studying and teasing out underlying properties of neural network optimization, and using those findings to obtain enhanced generalization performance. Our optimization schemes often draw inspiration from molecular dynamics and statistical physics and pave the way towards training robust and generalizable neural networks on datasets that arise in the physical sciences.
The contributions of this thesis are as follows: (1) We illustrate that embedding the loss gradient in a second order Langevin dynamics framework and using low temperatures leads to more exploration, increased robustness, and —in combination with partitioned integrators— can lead to enhanced generalization performance of neural networks on certain classification tasks. (2) We provide a general framework for using constrained stochastic differential equations to train deep neural networks. Constraints provide direct control of the parameter space, which allows us to directly study their effect on generalization. A statistical guarantee on the convergence of the training is provided, along with detailed implementation schemes for specific constraints –magnitude-based and orthogonality of the weight matrix– and extensive testing. (3) We illustrate the presence of latent multiple time scales in deep learning applications and introduce the use of multirate techniques for neural network training. We analyze the convergence properties of our multirate scheme and draw a comparison with vanilla stochastic gradient descent. As main application we show that using a multirate approach we can train deep neural networks for transfer learning applications in half the time, without losing generalization performance. (4) We re-evaluate existing deep learning metrics. In particular, we study the use of the loss along the linear path between the initial and final parameters of a network as a measure of the loss landscape. We show that caution is needed when using linear interpolation to make broader claims on the shape of the landscape and success of optimization. We also find that certain neural network layers are more sensitive to the choice of initialization and optimizer hyperparameter settings, and use these observations to design custom optimization schemes.
en
dc.identifier.uri
https://hdl.handle.net/1842/39125
dc.identifier.uri
http://dx.doi.org/10.7488/era/2376
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.title
Stochastic dynamics and partitioned algorithms for model parameterization in deep learning
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- VlaarTJ_2022.pdf
- Size:
- 15.67 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

