Querying graphs on large-scale data
Item statusRestricted Access
Embargo end date31/07/2022
This doctoral thesis will present the results of my work into querying graphs on large-scale data, from both the data perspective and query perspective. We first propose a scheme to reduce large graphs into small ones. It contracts obsolete parts, stars, cliques and paths into supernodes. We then build a hierarchical scheme to further reduce the graph, under limited resources. For both the contraction scheme and the hierarchy, we show that it is generic and lossless. We show that the same contracted graph is able to support multiple query classes at the same time, no matter whether their queries are label-based or not, local or non-local. Moreover, existing algorithms for these queries can be readily adapted to compute exact answers by using the synopses when possible, and decontracting the supernodes only when necessary. Using real-life graphs, we experimentally verify the efficiency and effectiveness of our contraction schemes. Meanwhile, we propose an extension of graph patterns, referred to as conditional graph patterns (CGPs). The CGPs allows one to express several conventional queries in a conditioned one, and annotate similar patterns to compute answers for all patterns in a single enumeration. In a CGP, one can specify a simple condition on each edge such that the edge exists if and only if the condition is satisfied. We show that CGPs allow us to catch missing links, increase the expressivity of graph functional dependencies, and provide a succinct representation of graph patterns. We studies their consistency, matching, incremental matching and containment problems and settles down their complexity bounds. We develop algorithms for matching, incremental matching and parallel matching of CGPs, and for (incremental, parallel) multi-CGP matching and optimization. We empiricallyverify the efficiency and effectiveness of our algorithms on real-life and synthetic graphs.