Geometry for deep representation learning
Item Status
Embargo End Date
Date
Authors
Khan, Mohammad Asif
Abstract
Deep representation learning has achieved remarkable success in discovering meaningful lowdimensional
features from high-dimensional data in recent years. In datasets containing face
images, these features can capture underlying factors of variations, such as age, eye colour, and
hairstyle. We can employ learned representations for solving tasks such as face detection. By
capturing these factors of variations, the representations aim to build a model of the real world,
reflecting its inherent regularities. However, current approaches still face challenges when it
comes to discovering complex regularities of the world in a data-efficient way, resulting in a
lack of interpretability, robustness and limited generalisation.
Recognising that real-world data spaces often exhibit regularities characterised by various
symmetries that need appropriate modelling assumptions is crucial. Consider an image of an
apple; we know its transformation under a translation operator will not change its identity as an
apple. Such properties that do not change under a broad family of transformations are known as
invariants.
“Geometry is a study of invariants"– Felix Klein (Klein, 1872).
In this thesis, we utilise geometry as a fundamental principle to account for relevant properties
in learning representation space. Specifically, we propose novel methodologies to address three
main challenges in deep representation learning: learning disentangled latent factors for image
sequences, investigating the robustness of deep latent factor models to adversarial perturbations,
and learning representations that account for hierarchical dependencies in heterophilic graphs.
The first project focuses on learning to disentangle content and motion information into
separate latent components for image sequences. Here, content refers to information shared
across all frames, for example, the identity of an object undergoing the dynamics, and motion
refers to information expressed in a given sequence frame. The temporal structure in image
sequences traces a path in a higher dimensional data space that takes the form of a 1-dimensional
manifold. A key challenge in learning representations from this data is designing a latent
dynamical model that accounts for the temporal structure of image sequences. In this work,
we utilise symplectic geometry in latent space for modelling the dynamics of various motions;
this structure in latent space associates a motion with a constant energy term that captures the
manifold of the dynamics of sequences. For a set of dynamical actions, we associate each
with a unique subspace that reflects the energy preservation of a respective dynamical action.
Our results demonstrate that we can disentangle factors of variations, facilitating tasks such as
controlled generation and motion transfer.
The second contribution proposes a robustness analysis of an oft-used representation learning
framework, namely variational autoencoders (VAEs). It is vital that VAEs are built to be reliable,
primarily for their real-world applications, such as latent space control in robotics or in a medical
domain for designing novel molecules by exploring the latent space. We examine latent space
from a geometric standpoint and establish a connection between the vulnerability of VAEs to
adversarial perturbations and the structure of the latent space. Our findings show that the learned
latent manifold has a high curvature with low/zero density regions, making VAEs susceptible
to adversarial attacks. We propose quantitative scores for measuring robustness and a simple
training mechanism for enhancing it.
Lastly, we target the challenge of representation learning for data on graph domains with a
heterophily property. In heterophilic graphs, the nodes not in an immediate vicinity may share
the same label due to their similar local connectivity structure. For example, in an academic
network, two researchers in different countries can exhibit similar local connectivity due to
the nature of their profession. We use diffusion geometry to explicitly model hierarchical
dependencies in a graph in the form of augmentations. We then use these augmentations in
a contrastive setup for learning representations of nodes in a graph. These representations
can facilitate various downstream tasks, including graph classification, link prediction, and
community detection. Our results showcase the effectiveness of augmentations in allowing
the encoder to capture hierarchical dependencies, demonstrated by improved performance on
several benchmark datasets.
In summary, through three core contributions, this thesis shows the importance of incorporating
geometry-based inductive biases into deep representation learning models to develop
efficient and reliable applications.
This item appears in the following Collection(s)

