Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Tackling the veracity and variety of big data

View/Open
JinR_2022.pdf (2.097Mb)
Date
21/06/2022
Author
Jin, Ruochun
Metadata
Show full item record
Abstract
This thesis tackles the veracity and variety challenges of big data, especially focusing on graphs and relational data. We start with proposing a class of graph association rules (GARs) to specify regularities between entities in graphs, which capture both missing links and inconsistencies. A GAR is a combination of a graph pattern and a dependency; it may take as predicates machine learning classifiers for link prediction. We formalize association deduction with GARs in terms of the chase, and prove its Church-Rosser property. We show that the satisfiability, implication and association deduction problems for GARs are coNP-complete, NP-complete and NP-complete, respectively. The incremental deduction problem is DP-complete for GARs. In addition, we provide parallel algorithms for association deduction and incremental deduction. We next develop a parallel algorithm to discover GARs, which applies an applicationdriven strategy to cut back rules and data that are irrelevant to users’ interest, by training a machine learning model to identify data pertaining to a given application. Moreover, we introduce a sampling method to reduce a big graph G to a set H of small sample graphs. Given expected support and recall bounds, this method is able to deduce samples in H and mine rules from H to satisfy the bounds in the entire G. Then we propose a class of temporal association rules (TACOs) for event prediction in temporal graphs. TACOs are defined on temporal graphs in terms of change patterns and (temporal) conditions, and may carry machine learning predicates for temporal event prediction. We settle the complexity of reasoning about TACOs, including their satisfiability, implication and prediction problems. We develop a system that discovers TACOs by iteratively training a rule creator based on generative models in a creatorcritic framework, and predicts events by applying the discovered TACOs in parallel. Finally, we propose an approach to querying relations D and graphs G taken together in SQL. The key idea is that if a tuple t in D and a vertex v in G are determined to refer to the same real-world entity, then we join t and v, correlate their information and complement tuple t with additional attributes of v from graphs. We show how to do this in SQL extended with only syntactic sugar, for both static joins when t is a tuple in D and dynamic joins when t comes from intermediate results of sub-queries on D. To support the semantic joins, we propose an attribute extraction scheme based on Kmeans clustering, to identify and fetch graph properties that are linked to v via paths. Moreover, we develop a scheme to extract a relation schema for entities in graphs, and a heuristic join method based on the schema to strike a balance between the complexity and accuracy of dynamic joins.
URI
https://hdl.handle.net/1842/39161

http://dx.doi.org/10.7488/era/2412
Collections
  • Informatics thesis and dissertation collection

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page