Advanced spectral algorithms for static and dynamic graph clustering
Item Status
Embargo End Date
Date
Authors
Abstract
Graph clustering is a fundamental task in data analysis and machine learning, with a wide variety of applications across scientific and industrial settings.
However, the increasing scale and dynamic nature of modern datasets present significant computational hurdles for traditional graph clustering algorithms.
This thesis addresses these challenges by using spectral techniques to develop novel, efficient, and dynamic algorithms designed to learn cluster structure in both static and evolving graphs.
Firstly, we investigate hierarchical clustering (HC) -- the recursive partitioning of a dataset into increasingly smaller clusters using a binary tree representation -- specifically with respect to Dasgupta's cost function. While computing an optimal HC tree is NP-hard in general, we exploit the `well-clustered' property common in practical graphs. For this setting, we introduce two novel algorithms that construct constant-factor approximate HC trees in nearly-linear time in the input size of the graph, representing a significant advance over the previous state-of-the-art. They are the first for well-clustered graphs to simultaneously achieve near-optimal approximation guarantees and running time.
Secondly, we consider clustering dynamic graphs subject to vertex and edge updates, where the underlying cluster structure may change. We introduce a randomised dynamic algorithm, based on spectral clustering, that maintains an approximate clustering. It achieves optimal constant amortised update time and sublinear query time for retrieving cluster assignments. This is the first dynamic spectral clustering algorithm with these simultaneous guarantees.
Thirdly, we study the dynamic clustering of a set of data points. Clustering algorithms for data points often begin by constructing a similarity graph. However, these graphs are typically dense, requiring quadratic space and computation, rendering them intractable for large datasets or dynamic settings. We introduce an algorithm that dynamically maintains a sparse approximation of the fully connected similarity graph while preserving its cluster structure, with sublinear update time. A key building block for designing this algorithm is a novel dynamic data structure for the classical kernel density estimation problem.
Finally, all theoretical results are extensively evaluated on both synthetic and real-world datasets, demonstrating that the newly developed algorithms are practical, effective, and applicable to real-world data.
This item appears in the following Collection(s)

