Edinburgh Research Archive

Geometry for deep representation learning

Item Status

Embargo End Date

Authors

Khan, Mohammad Asif

Abstract

Deep representation learning has achieved remarkable success in discovering meaningful lowdimensional features from high-dimensional data in recent years. In datasets containing face images, these features can capture underlying factors of variations, such as age, eye colour, and hairstyle. We can employ learned representations for solving tasks such as face detection. By capturing these factors of variations, the representations aim to build a model of the real world, reflecting its inherent regularities. However, current approaches still face challenges when it comes to discovering complex regularities of the world in a data-efficient way, resulting in a lack of interpretability, robustness and limited generalisation. Recognising that real-world data spaces often exhibit regularities characterised by various symmetries that need appropriate modelling assumptions is crucial. Consider an image of an apple; we know its transformation under a translation operator will not change its identity as an apple. Such properties that do not change under a broad family of transformations are known as invariants. “Geometry is a study of invariants"– Felix Klein (Klein, 1872). In this thesis, we utilise geometry as a fundamental principle to account for relevant properties in learning representation space. Specifically, we propose novel methodologies to address three main challenges in deep representation learning: learning disentangled latent factors for image sequences, investigating the robustness of deep latent factor models to adversarial perturbations, and learning representations that account for hierarchical dependencies in heterophilic graphs. The first project focuses on learning to disentangle content and motion information into separate latent components for image sequences. Here, content refers to information shared across all frames, for example, the identity of an object undergoing the dynamics, and motion refers to information expressed in a given sequence frame. The temporal structure in image sequences traces a path in a higher dimensional data space that takes the form of a 1-dimensional manifold. A key challenge in learning representations from this data is designing a latent dynamical model that accounts for the temporal structure of image sequences. In this work, we utilise symplectic geometry in latent space for modelling the dynamics of various motions; this structure in latent space associates a motion with a constant energy term that captures the manifold of the dynamics of sequences. For a set of dynamical actions, we associate each with a unique subspace that reflects the energy preservation of a respective dynamical action. Our results demonstrate that we can disentangle factors of variations, facilitating tasks such as controlled generation and motion transfer. The second contribution proposes a robustness analysis of an oft-used representation learning framework, namely variational autoencoders (VAEs). It is vital that VAEs are built to be reliable, primarily for their real-world applications, such as latent space control in robotics or in a medical domain for designing novel molecules by exploring the latent space. We examine latent space from a geometric standpoint and establish a connection between the vulnerability of VAEs to adversarial perturbations and the structure of the latent space. Our findings show that the learned latent manifold has a high curvature with low/zero density regions, making VAEs susceptible to adversarial attacks. We propose quantitative scores for measuring robustness and a simple training mechanism for enhancing it. Lastly, we target the challenge of representation learning for data on graph domains with a heterophily property. In heterophilic graphs, the nodes not in an immediate vicinity may share the same label due to their similar local connectivity structure. For example, in an academic network, two researchers in different countries can exhibit similar local connectivity due to the nature of their profession. We use diffusion geometry to explicitly model hierarchical dependencies in a graph in the form of augmentations. We then use these augmentations in a contrastive setup for learning representations of nodes in a graph. These representations can facilitate various downstream tasks, including graph classification, link prediction, and community detection. Our results showcase the effectiveness of augmentations in allowing the encoder to capture hierarchical dependencies, demonstrated by improved performance on several benchmark datasets. In summary, through three core contributions, this thesis shows the importance of incorporating geometry-based inductive biases into deep representation learning models to develop efficient and reliable applications.

This item appears in the following Collection(s)