Show simple item record

dc.contributor.advisorLibkin, Leonid
dc.contributor.advisorGuagliardo, Paolo
dc.contributor.authorToussaint, Etienne
dc.date.accessioned2023-05-24T14:23:58Z
dc.date.available2023-05-24T14:23:58Z
dc.date.issued2023-05-24
dc.identifier.urihttps://hdl.handle.net/1842/40613
dc.identifier.urihttp://dx.doi.org/10.7488/era/3378
dc.description.abstractIncomplete and uncertain information is ubiquitous in database management applications. However, the techniques specifically developed to handle incomplete data are not sufficient. Even the evaluation of SQL queries on databases containing NULL values remains a challenge after 40 years. There is no consensus on what an answer to a query on an incomplete database should be, and the existing notions often have limited applicability. One of the most prevalent techniques in the literature is based on finding answers that are certainly true, independently of how missing values are interpreted. However, this notion has yielded several conflicting formal definitions for certain answers. Based on the fact that incomplete data can be enriched by some additional knowledge, we designed a notion able to unify and explain the different definitions for certain answers. Moreover, the knowledge-preserving certain answers notion is able to provide the first well-founded definition of certain answers for the relational bag data model and value-inventing queries, addressing some key limitations of previous approaches. However, it doesn’t provide any guarantee about the relevancy of the answers it captures. To understand what would be relevant answers to queries on incomplete databases, we designed and conducted a survey on the everyday usage of NULL values among database users. One of the findings from this socio-technical study is that even when users agree on the possible interpretation of NULL values, they may not agree on what a satisfactory query answer is. Therefore, to be relevant, query evaluation on incomplete databases must account for users’ tasks and preferences. We model users’ preferences and tasks with the notion of regret. The regret function captures the task-dependent loss a user endures when he considers a database as ground truth instead of another. Thanks to this notion, we designed the first framework able to provide a score accounting for the risk associated with query answers. It allows us to define the risk-minimizing answers to queries on incomplete databases. We show that for some regret functions, regret-minimizing answers coincide with certain answers. Moreover, as the notion is more agile, it can capture more nuanced answers and more interpretations of incompleteness. A different approach to improve the relevancy of an answer is to explain its provenance. We propose to partition the incompleteness into sources and measure their respective contribution to the risk of answer. As a first milestone, we study several models to predict the evolution of the risk when we clean a source of incompleteness. We implemented the framework, and it exhibits promising results on relational databases and queries with aggregate and grouping operations. Indeed, the model allows us to infer the risk reduction obtained by cleaning an attribute. Finally, by considering a game theoretical approach, the model can provide an explanation for answers based on the contribution of each attributes to the risk.en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionConsole, M., Guagliardo, P., Libkin, L., & Toussaint, E. (2020). Coping with incomplete data: Recent advances. In Proceedings of the 39th ACM symposium on principles of database systems, PODS 2020 (pp. 33–47). ACM.en
dc.relation.hasversionToussaint, E., Guagliardo, P., & Libkin, L. (2020). Knowledge-preserving certain answers for sql-like queries. In Kr 2020-17th international conference on principles of knowledge representation and reasoning (pp. 758–767).en
dc.relation.hasversionToussaint, E., Guagliardo, P., Libkin, L., & Sequeda, J. (2022). Troubles with nulls, views from the users. Proceedings of the VLDB Endowment, 15(11), 2613– 2625en
dc.subjectdata scienceen
dc.subjectSQLen
dc.subjectdatabaseen
dc.subjectcertaintyen
dc.subjectincompletnessen
dc.subjectrelational databasesen
dc.subjectexplanationsen
dc.titleToward relevant answers to queries on incomplete databasesen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record