BESS: bounded evaluation SQL systems
Files
Item Status
Embargo End Date
Date
Authors
Yuan, Tengfei
Abstract
It could be prohibitively costly to query big relations, even with the power of parallel processing on clusters containing thousands of machines. The tremendous resource burden drives us to find an alternative method to answer big datasets, which need to be affordable for small businesses and lightweight software. The theory of bounded evaluation could reduce queries on big data to computations on small data. It advocates an unconventional query evaluation paradigm under an access schema A, a combination of cardinality constraints and associated indices. For practical use to emerge from the work, it is urgent to design a database system with bounded evaluation capacity.
Firstly, the thesis proposes the design and framework of bounded evaluation systems named BESS. The framework is conducted by two workflows, online query processing and offline access schema management. Its unique features are discussed, which differ from the framework paradigm and conventional database systems. Besides, the thesis shows the feasibility of the framework with two implementation variants. The challenges and advantages of each approach are discussed during the comparison of the workflows.
Secondly, the thesis presents a prototype named BEAS, targeting traditional DBMS. Under this prototype, the thesis proposes the algorithms for online query processing, from checking bounded evaluability to generating bounded query plans. Moreover, the thesis extends bounded evaluation to RAaggr (RA with aggregation) queries under bag semantics to make the prototype more practical in real-life circumstances.
Thirdly, the thesis presents another prototype, named Zidian, based on KV stores to speed up SQL query evaluation over NoSQL. As the foundation, the thesis proposes a block-as-a-value data (BaaV) model and its corresponding relational algebra on this prototype. Beyond the bounded evaluation, the thesis studies scan-free data access. Also, the thesis studies the query evaluation process, which verified that Zidian substantially reduces data access and communication cost. Moreover, Zidian could be plugged into existing SQL-over-NoSQL systems while retaining horizontal scalability.
Finally, to make the framework more practical, the thesis studies the problem of schema discovery for BESS. In the form of a schema selection problem for the BaaV model, the thesis presents a full treatment to generate schema for parametric SQL queries over keyed blocks on KV stores. The thesis develops and verifies the framework and criteria for the schema selection problem, considering both storage and computation. Meanwhile, the thesis presents practical algorithms for selection that guarantees certain optimality under practical conditions.
This item appears in the following Collection(s)

