Querying graphs with data
Graph data is becoming more and more pervasive. Indeed, services such as Social Networks or the Semantic Web can no longer rely on the traditional relational model, as its structure is somewhat too rigid for the applications they have in mind. For this reason we have seen a continuous shift towards more non-standard models. First it was the semi-structured data in the 1990s and XML in 2000s, but even such models seem to be too restrictive for new applications that require navigational properties naturally modelled by graphs. Social networks fit into the graph model by their very design: users are nodes and their connections are specified by graph edges. The W3C committee, on the other hand, describes RDF, the model underlying the Semantic Web, by using graphs. The situation is quite similar with crime detection networks and tracking workflow provenance, namely they all have graphs inbuilt into their definition. With pervasiveness of graph data the important question of querying and maintaining it has emerged as one of the main priorities, both in theoretical and applied sense. Currently there seem to be two approaches to handling such data. On the one hand, to extract the actual data, practitioners use traditional relational languages that completely disregard various navigational patterns connecting the data. What makes this data interesting in modern applications, however, is precisely its ability to compactly represent intricate topological properties that envelop the data. To overcome this issue several languages that allow querying graph topology have been proposed and extensively studied. The problem with these languages is that they concentrate on navigation only, thus disregarding the data that is actually stored in the database. What we propose in this thesis is the ability to do both. Namely, we will study how query languages can be designed to allow specifying not only how the data is connected, but also how data changes along paths and patterns connecting it. To this end we will develop several query languages and show how adding different data manipulation capabilities and different navigational features affects the complexity of main reasoning tasks. The story here is somewhat similar to the early success of the relational data model, where theoretical considerations led to a better understanding of what makes certain tasks more challenging than others. Here we aim for languages that are both efficient and capable of expressing a wide variety of queries of interest to several groups of practitioners. To do so we will analyse how different requirements affect the language at hand and at the end provide a good base of primitives whose inclusion into a language should be considered, based on the applications one has in mind. Namely, we consider how adding a specific operation, mechanism, or capability to the language affects practical tasks that such an addition plans to tackle. In the end we arrive at several languages, all of them with their pros and cons, giving us a good overview of how specific capabilities of the language affect the design goals, thus providing a sound basis for practitioners to choose from, based on their requirements.