Boosting graph computation with generic methods: partitioning and incrementalization
Abstract
In this thesis we develop a package of generic methods for boosting the velocity of
graph computations, regarding partitioning and incrementalization. The former is to
deal with the challenges of volume and velocity over big data, and make computations
scalable. The latter is to speed up computations for dynamic graph analytics. On the
one hand, existing partitioners handle graphs in an “one-size-fits-all” way and do not
consider the applications running on the top. This results in sub-optimal performances
for distributed computations. On the other hand, however, current incremental graph
algorithms are ad hoc designed, which require a lot of efforts even for domain experts.
Worse still, most of these algorithms lack provable guarantees for their efficiency. To
this end, this thesis presents a series of generic yet effective approaches for boosting
the efficiency and scalability of graph computation.
Firstly, for graph partitioning, we propose an application-driven hybrid partitioning
strategy that takes the top-level application into consideration. Given a graph algorithm
A, the strategy learns a cost model for A as polynomial regression. We develop partitioners that given the learned cost model, refine an edge-cut or vertex-cut partition to a
hybrid partition and reduce the parallel cost of A. Moreover, we extend the partitioners
to handle multiple cost models of mixed workloads in one batch.
Secondly, we study generic guidelines for incrementalizing graph algorithms. We
identify a class of incrementalizable algorithms abstracted in a fixpoint model. We
show how to deduce an incremental algorithm A∆ from such an algorithm A. Moreover, A∆ can be made bounded relative to A, i.e., its cost is determined by the size
of changes to graphs and changes to the affected area that is necessarily checked by
batch algorithm A. We provide generic conditions under which a deduced algorithm
A∆ warrants to be correct and relatively bounded, by adopting the same logic and data
structures of A, at most using timestamps as an additional auxiliary structure.
Finally, we go beyond the incrementalization of generic graph algorithms, and
focus on graph partitioners where the algorithms are heuristic and exact results are
not required. We propose to incrementalize widely-used graph partitioners A into
heuristically-bounded incremental algorithms A∆. Given graph G, updates ∆G to G
and a partition A(G) of G by A, A∆ computes changes ∆O to A(G) such that (1) applying ∆O to A(G) produces a new partition of the updated graph although it may not
be exactly the one derived by A, (2) it retains the same bounds on balance and cut sizes
as A, and (3) ∆O is decided by ∆G alone. We show that we can deduce A∆ from both
vertex-cut and edge-cut partitioners A, retaining their bounds
This item appears in the following Collection(s)

