dc.contributor.advisor | Libkin, Leonid | |
dc.contributor.advisor | Guagliardo, Paolo | |
dc.contributor.author | Toussaint, Etienne | |
dc.date.accessioned | 2023-05-24T14:23:58Z | |
dc.date.available | 2023-05-24T14:23:58Z | |
dc.date.issued | 2023-05-24 | |
dc.identifier.uri | https://hdl.handle.net/1842/40613 | |
dc.identifier.uri | http://dx.doi.org/10.7488/era/3378 | |
dc.description.abstract | Incomplete and uncertain information is ubiquitous in database management applications. However, the techniques specifically developed to handle incomplete data are
not sufficient. Even the evaluation of SQL queries on databases containing NULL
values remains a challenge after 40 years. There is no consensus on what an answer
to a query on an incomplete database should be, and the existing notions often have
limited applicability.
One of the most prevalent techniques in the literature is based on finding answers
that are certainly true, independently of how missing values are interpreted. However,
this notion has yielded several conflicting formal definitions for certain answers. Based
on the fact that incomplete data can be enriched by some additional knowledge, we
designed a notion able to unify and explain the different definitions for certain answers.
Moreover, the knowledge-preserving certain answers notion is able to provide the first
well-founded definition of certain answers for the relational bag data model and value-inventing queries, addressing some key limitations of previous approaches. However,
it doesn’t provide any guarantee about the relevancy of the answers it captures.
To understand what would be relevant answers to queries on incomplete databases,
we designed and conducted a survey on the everyday usage of NULL values among
database users. One of the findings from this socio-technical study is that even when
users agree on the possible interpretation of NULL values, they may not agree on
what a satisfactory query answer is. Therefore, to be relevant, query evaluation on
incomplete databases must account for users’ tasks and preferences.
We model users’ preferences and tasks with the notion of regret. The regret function
captures the task-dependent loss a user endures when he considers a database as
ground truth instead of another. Thanks to this notion, we designed the first framework
able to provide a score accounting for the risk associated with query answers. It allows
us to define the risk-minimizing answers to queries on incomplete databases. We
show that for some regret functions, regret-minimizing answers coincide with certain
answers. Moreover, as the notion is more agile, it can capture more nuanced answers
and more interpretations of incompleteness.
A different approach to improve the relevancy of an answer is to explain its provenance.
We propose to partition the incompleteness into sources and measure their respective contribution to the risk of answer. As a first milestone, we study several models
to predict the evolution of the risk when we clean a source of incompleteness. We
implemented the framework, and it exhibits promising results on relational databases
and queries with aggregate and grouping operations. Indeed, the model allows us
to infer the risk reduction obtained by cleaning an attribute. Finally, by considering a
game theoretical approach, the model can provide an explanation for answers based
on the contribution of each attributes to the risk. | en |
dc.language.iso | en | en |
dc.publisher | The University of Edinburgh | en |
dc.relation.hasversion | Console, M., Guagliardo, P., Libkin, L., & Toussaint, E. (2020). Coping with incomplete data: Recent advances. In Proceedings of the 39th ACM symposium on principles of database systems, PODS 2020 (pp. 33–47). ACM. | en |
dc.relation.hasversion | Toussaint, E., Guagliardo, P., & Libkin, L. (2020). Knowledge-preserving certain answers for sql-like queries. In Kr 2020-17th international conference on principles of knowledge representation and reasoning (pp. 758–767). | en |
dc.relation.hasversion | Toussaint, E., Guagliardo, P., Libkin, L., & Sequeda, J. (2022). Troubles with nulls, views from the users. Proceedings of the VLDB Endowment, 15(11), 2613– 2625 | en |
dc.subject | data science | en |
dc.subject | SQL | en |
dc.subject | database | en |
dc.subject | certainty | en |
dc.subject | incompletness | en |
dc.subject | relational databases | en |
dc.subject | explanations | en |
dc.title | Toward relevant answers to queries on incomplete databases | en |
dc.type | Thesis or Dissertation | en |
dc.type.qualificationlevel | Doctoral | en |
dc.type.qualificationname | PhD Doctor of Philosophy | en |