Graph pattern matching on social network analysis
Item Status
Embargo End Date
Date
Authors
Abstract
Graph pattern matching is fundamental to social network analysis. Its effectiveness
for identifying social communities and social positions, making recommendations and
so on has been repeatedly demonstrated. However, the social network analysis raises
new challenges to graph pattern matching. As real-life social graphs are typically
large, it is often prohibitively expensive to conduct graph pattern matching over such
large graphs, e.g., NP-complete for subgraph isomorphism, cubic time for bounded
simulation, and quadratic time for simulation. These hinder the applicability of graph
pattern matching on social network analysis. In response to these challenges, the thesis
presents a series of effective techniques for querying large, dynamic, and distributively
stored social networks.
First of all, we propose a notion of query preserving graph compression, to compress
large social graphs relative to a class Q of queries. We then develop both batch
and incremental compression strategies for two commonly used pattern queries. Via
both theoretical analysis and experimental studies, we show that (1) using compressed
graphs Gr benefits graph pattern matching dramatically; and (2) the computation of Gr
as well as its maintenance can be processed efficiently.
Secondly, we investigate the distributed graph pattern matching problem, and explore
parallel computation for graph pattern matching. We show that our techniques
possess following performance guarantees: (1) each site is visited only once; (2) the total
network traffic is independent of the size of G; and (3) the response time is decided
by the size of largest fragment of G rather than the size of entire G. Furthermore, we
show how these distributed algorithms can be implemented in the MapReduce framework.
Thirdly, we study the problem of answering graph pattern matching using views
since view based techniques have proven an effective technique for speeding up query
evaluation. We propose a notion of pattern containment to characterise graph pattern
matching using views, and introduce efficient algorithms to answer graph pattern
matching using views. Moreover, we identify three problems related to graph pattern
containment, and provide efficient algorithms for containment checking (approximation
when the problem is intractable).
Fourthly, we revise graph pattern matching by supporting a designated output node,
which we treat as “query focus”. We then introduce algorithms for computing the top-k
relevant matches w.r.t. the output node for both acyclic and cyclic pattern graphs, respectively,
with early termination property. Furthermore, we investigate the diversified
top-k matching problem, and develop an approximation algorithm with performance
guarantee and a heuristic algorithm with early termination property.
Finally, we introduce an expert search system, called ExpFinder, for large and dynamic
social networks. ExpFinder identifies top-k experts in social networks by graph
pattern matching, and copes with the sheer size of real-life social networks by integrating
incremental graph pattern matching, query preserving compression and top-k
matching computation. In particular, we also introduce bounded (resp. unbounded)
incremental algorithms to maintain the weighted landmark vectors which are used for
incremental maintenance for cached results.
This item appears in the following Collection(s)

