Unifying heterogeneous data management: from philosophy foundation to system implementation
dc.contributor.advisor
Cao, Yang
dc.contributor.advisor
Fan, Wenfei
dc.contributor.author
Fu, Wenzhi
dc.date.accessioned
2025-10-01T13:36:18Z
dc.date.available
2025-10-01T13:36:18Z
dc.date.issued
2025-10-01
dc.description.abstract
This thesis aims to unify heterogeneous data anagement with a revised relational model for uniformly querying and updating across data in different formats without performance degradation. To address this well-recognized crucial challenge for extracting value from the variety of big data, the discussion begins with a prior philosophical reflection, though its necessity is often overlooked, on this variety itself: the existence of numerous incompatible data models. We clarify that various data models are all essentially cognitions of the task of database management, artificially developed under different considerations but all with the identical goal of making raw data manageable in the same physical world, rather than Cartesian mirror-images of distinct objective realities. The typical ignorance of this neglects the identity behind the opposites among the structures and operations they impose, thereby undermining the ideal of developing a general method for the task of database management.
Regaining this long-dusted ideal, we recognize a pathway to achieving it through an investigation into the distinctiveness of the relational model that has enabled it to dominate the field of database management for decades. We show that the key characteristics of the relational model can directly address the model-independent challenges inherent in this task, thereby enabling the systems developed accordingly to perform better, which thus makes them indispensable for any data model to succeed in practice and therefore renders them also “relational”. This is not meant to establish the relational model as a “codex”, but rather to shed light on the principle of evolving it to catch up with the continuously progressing task of database management, so that the overall landscape of this task, amidst endless emerging challenges, can still be addressed by a single model rather than a combination of heterogeneous ones.
In light of this, as a preliminary step toward unified database management, we provide an overarching framework to uniformly adopt different standpoints in accomplishing this task without combining heterogeneous models. Specifically, surrounding a proposed RG (Relational Generative) model, we develop different components for the options within the following aspects to be configurable: the connections can be represented either explicitly or implicitly; the consistency can be annotated on different data items and operations, across distinct isolation levels; and the analysis of data can be programmed both in the declarative and imperative paradigms. Practitioners would then no longer need to turn to an alternative, purpose-built model or system to adopt a particular method for managing their data, at least within these aspects, thereby paving the way toward the unification of heterogeneous data management.
More technically, in the first part, we introduce the RG model, an extension of the relational model with logic-level pointers to connect different tuples, to represent graphs in relations while preserving its topology. By incorporating graph exploration via logical-level pointers as an operator, we generalize the relational query evaluation workflow to provide an enlarged unified plan space for graph-relation hybrid queries. The optimal query plan can then be generated and executed on a typical relational database system seamlessly without the need for a customized execution engine, retaining the battle-hardened optimizations embedded in relational databases while enabling hybrid queries over graphs and relations.
In the second part, still concentrating on the analytical task, we identify redundant computations in query evaluation and introduce a recursive execution paradigm to eliminate them accordingly. Namely, we show that different subqueries might be automorphisms of each other, causing redundant computations that can only be eliminated by sharing results among subqueries. This will thus introduce recursive data passing, which breaks the separate execution of each join, even if those queries can still be described without recursion. We theoretically prove that such a recursive execution paradigm can help query evaluation to reach a complexity lower bound in string pattern matching that was once unachievable.
In the third part, we concentrate on transactional tasks. Building upon the approach above for unifying relations and graphs, we develop a way of running transactions across relations and graphs. In addition, we further propose a fine-grained isolation level to warrant data-driven isolation guarantees according to user-defined annotations in each transaction, reflecting the heterogeneity of the data. This allows data items touched by a single transaction to be treated separately and differently, protecting only the critical logic with relatively high isolation while avoiding unnecessary isolation guarantees, thereby improving transaction throughput without impairing the correctness of application logic during concurrent execution.
Finally, as a practical and accessible implementation of the proposed mechanisms, WhiteDB has been developed to assess the benefits they offer when applied to real-world data. It allows the same piece of data to be viewed both as “data model” and “object” and also to be accessed using both declarative and the imperative programming paradigms. Such a Versatility enables it to function as both C++ library with a variety of frequently utilized functionalities as well as an embedding database, thereby supporting an integrated multi-paradigm data analysis.
en
dc.identifier.uri
https://hdl.handle.net/1842/44016
dc.identifier.uri
http://dx.doi.org/10.7488/era/6544
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.rights.embargodate
2026-10-01
en
dc.subject
analytical query evaluation
en
dc.subject
transactional query processing
en
dc.subject
recursive query evaluation
en
dc.subject
heterogeneous data management
en
dc.subject
database system
en
dc.title
Unifying heterogeneous data management: from philosophy foundation to system implementation
en
dc.title.alternative
On unifying heterogeneous data management: from philosophy foundation to system implementation
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
dcterms.accessRights
RESTRICTED ACCESS
en
Files
Original bundle
1 - 1 of 1
- Name:
- Fu2025.pdf
- Size:
- 19.13 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

