Computation and data efficient techniques for training computer vision methods
Item statusRestricted Access
Embargo end date11/07/2024
Kocyigit, Mustafa Taha
Increased use of data and computation have been the main drivers in Deep Learning for improving performance, solving previously unsolvable problems and unlocking new capabilities. At the same time cost of training and obtaining annotations have been increasing exponentially for large scale models where in certain circumstances it is not even feasible for many research organizations to reproduce ground breaking results in the field let alone improve them. This thesis focuses on problems that arise from scaling computational and data requirements for training computer vision algorithms and proposes efficient training techniques in three scenarios. First where data is available but labels are very limited, second where data is abundant but no labels are available and finally where both labels and data is abundant but computation is limited. We provided model-agnostic and efficient solutions for these urgent bottlenecks. Obtaining labeled data is one of the most expensive parts of training deep learning methods which are notoriously data hungry. We have identified that batch normalization statistics calculated from limited labeled data does not reflect the true data distribution and prevent training deep learning models effectively. We proposed a semi-supervised learning method where batch normalization calculations are augmented with unlabeled examples to alleviate this problem and train deep learning models more accurately. Recently self-supervised learning has been used to learn augmentation invariant representations from unlabelled data. These representations have been shown to effectively transfer to tasks where only limited labeled data is available. This eliminates the data bottleneck while exacerbating the computational bottleneck since these methods are much more computationally hungry than their supervised alternatives. In order to accelerate the training for these methods we introduce three strategies: a progressive augmentation and resolution schedule, a fast hard augmentations mining scheme and a matching accelerated learning rate schedule. We show that our training strategies reduce the cost of training for many self-supervised methods while maintaining the same accuracy as standard training. In problems where annotation cost is low and large scale labeled data is available more sophisticated model architectures like Vision transformers can be used which require longer and computationally more demanding training settings. We exploit the intuition that not all the samples carry similar amount of information, and introduce a fast online importance sampling technique that allows faster training. We show that we can accelerate the supervised training on many datasets and backbone architectures while reaching the same level of accuracy as standard training.
Showing items related by title, author, creator and subject.
Blake, Andrew (The University of Edinburgh, 1984)This thesis is concerned with problems of using computers to interpret scenes from television camera pictures. In particular, it tackles the problem of interpreting the picture in terms of lines and curves, rather like ...
Hinton, Geoffrey E. (The University of Edinburgh, 1977)It is argued that a visual system, especially one which handles imperfect data, needs a way of selecting the best consistent combination from among the many interrelated, locally plausible hypotheses about how parts or ...
Romaszko, Lukasz (The University of Edinburgh, 2020-06-25)The goal of scene understanding is to interpret images, so as to infer the objects present in a scene, their poses and fine-grained details. This thesis focuses on methods that can provide a much more detailed ...