Data-efficient neural network training with dataset condensation
The state of the art in many data driven fields including computer vision and natural language processing typically relies on training larger models on bigger data. It is reported by OpenAI that the computational cost to achieve the state of the art doubles every 3.4 months in the deep learning era. In contrast, the GPU computation power doubles every 21.4 months, which is significantly slower. Thus, advancing deep learning performance by consuming more hardware resources is not sustainable. How to reduce the training cost while preserving the generalization performance is a long standing goal in machine learning. This thesis investigates a largely under-explored while promising solution - dataset condensation which aims to condense a large training set into a small set of informative synthetic samples for training deep models and achieve close performance to models trained on the original dataset. In this thesis, we investigate how to condense image datasets for classification tasks. We propose three methods for image dataset condensation. Our methods can be applied to condense other kinds of datasets for different learning tasks, such as text data, graph data and medical images, and we discuss it in Section 6.1. First, we propose a principled method that formulates the goal of learning a small synthetic set as a gradient matching problem with respect to the gradients of deep neural network weights that are trained on the original and synthetic data. A new gradient/weight matching loss is designed for robust matching of different neural architectures. We evaluate its performance in several image classification benchmarks and explore the usage of our method in continual learning and neural architecture search. In the second work, we propose to further improve the data-efficiency of training neural networks with synthetic data by enabling effective data augmentation. Specifically, we propose Differentiable Siamese Augmentation and learn better synthetic data that can be used more effectively with data augmentation and thus achieve better performance when training networks with data augmentation. Experiments verify that the proposed method obtains substantial gains over the state of the art. While training deep models on the small set of condensed images can be extremely fast, their synthesis remains computationally expensive due to the complex bi-level optimization. Finally, we propose a simple yet effective method that synthesizes condensed images by matching feature distributions of the synthetic and original training images when being embedded by randomly sampled deep networks. Thanks to its efficiency, we apply our method to more realistic and larger datasets with sophisticated neural architectures and obtain a significant performance boost. In summary, this manuscript presents several important contributions that improve data efficiency of training deep neural networks by condensing large datasets into significantly smaller synthetic ones. The innovations focus on principled methods based on gradient matching, higher data-efficiency with differentiable Siamese augmentation, and extremely simple and fast distribution matching without bilevel optimization. The proposed methods are evaluated on popular image classification datasets, namely MNIST, FashionMNIST, SVHN, CIFAR10/100 and TinyImageNet. The code is available at https://github.com/VICO-UoE/DatasetCondensation.