Edinburgh Research Archive

Extending SQL with marked nulls: from design to implementation and application in computing certain answers

Item Status

RESTRICTED ACCESS

Embargo End Date

2027-02-11

Abstract

Marked null, a theoretical framework for handling missing values in incomplete data bases, has been extensively studied in literature since the 1980s. However, its practical application has remained unexplored until now. This research introduces marked null through marked data types, which encode marked nulls alongside constants and SQL nulls. We explore two possible encodings of marked nulls, and we define the semantics and behaviour of casts, comparisons, operations, and aggregations for marked data types. Based on these concepts, we develop two prototype implementations as PostgreSQL extensions: one written in SQL Standard for cross-platform compatibility and the other in C targeting PostgreSQL only for optimized performance. Comprehensive benchmarking evaluates both prototypes in terms of space usage and query performance. Performance overhead is analysed across multiple levels, including individual functions, join strategies, the Join Order Benchmark, and TPC benchmarks for practical workloads. Results show that the SQL Standard implementation, designed for broad compatibility, faces significant performance challenges in both space and time, and it suffers from incompatibility issue stemming from inconsistent implementations of SQL across commercial database systems. In contrast, the C implementation leverages PostgreSQL’s extensibility to deliver satisfactory performance, incurring at worst a 19.1% increase in disk space usage and a 9.2% geometric mean query performance overhead for TPC-H Benchmark. With a functional marked null implementation in place, we further explore its application in the problem of computing certain answers. Building on a recent study that provides a translation of queries for correctness guarantee, we simplify this translation using marked nulls. The correctness guarantee is maintained and query performance is improved. Overall, this work demonstrates the feasibility of marked null and illustrates a concrete example of its practical use. It bridges the gap between theory and practice, paving the way for further research and adoption of marked nulls in real-world database systems.

This item appears in the following Collection(s)