Edinburgh Research Archive

Analysing system behaviour by automatic benchmarking of system-level provenance

dc.contributor.advisor
Cheney, James
en
dc.contributor.advisor
Bhatotia, Pramod
en
dc.contributor.author
Chan, Sheung Chi
en
dc.contributor.sponsor
other
en
dc.date.accessioned
2020-06-19T11:04:56Z
dc.date.available
2020-06-19T11:04:56Z
dc.date.issued
2020-06-25
dc.description.abstract
Provenance is a term originating from the work of art. It aims to provide a chain of information of a piece of arts from its creation to the current status. It records all the historic information relating to this piece of art, including the storage locations, ownership, buying prices, etc. until the current status. It has a very similar definition in data processing and computer science. It is used as the lineage of data in computer science to provide either reproducibility or tracing of activities happening in runtime for a different purpose. Similar to the provenance used in art, provenance used in computer science and data processing field describes how a piece of data was created, passed around, modified, and reached the current state. Also, it provides information on who is responsible for certain activities and other related information. It acts as metadata on components in a computer environment. As the concept of provenance is to record all related information of some data, the size of provenance itself is generally proportional to the amount of data processing that took place. It generally tends to be a large set of data and is hard to analyse. Also, in the provenance collecting process, not all information is useful for all purposes. For example, if we just want to trace all previous owners of a file, then all the storage location information may be ignored. To capture useful information and without needing to handle a large amount of information, researchers and developers develop different provenance recording tools that only record information needed by particular applications with different means and mechanisms throughout the systems. This action allows a lighter set of information for analysis but it results in non-standard provenance information and general users may not have a clear view on which tools are better for some purposes. For example, if we want to identify if certain action sequences have been performed in a process and who is accountable for these actions for security analysis, we have no idea which tools should be trusted to provide the correct set of information. Also, it is hard to compare the tools as there is not much common standard around. With the above need in mind, this thesis concentrate on providing an automated system ProvMark to benchmark the tools. This helps to show the strengths and weaknesses of their provenance results in different scenarios. It also allows tool developers to verify their tools and allows end-users to compare the tools at the same level to choose a suitable one for the purpose. As a whole, the benchmarking based on the expressiveness of the tools on different scenarios shows us the right choice of provenance tools on specific usage.
en
dc.identifier.uri
https://hdl.handle.net/1842/37155
dc.identifier.uri
http://dx.doi.org/10.7488/era/456
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Chan,S.C.,Gehani,A.,Cheney,J.,Sohan,R.,&Irshad,H.(2017). Expressiveness benchmarking for system-level provenance. In Proceedings of the 9th USENIX Workshop on the Theory and Practice of Provenance. USENIX, 2017.
en
dc.relation.hasversion
Chan, S. C., Cheney, J., Bhatotia, P., Pasquier, T., Gehani, A., Irshad, H., Carata, L., & Seltzer, M. (2019). ProvMark: A Provenance Expressiveness Benchmarking System. In Proceedings of the 20th International Middleware Conference. ACM, 2019
en
dc.relation.hasversion
Chan,S.C.(2019). Analysing system behaviour by automatic benchmarking of system-level provenance. In Proceedings of the 20th International Middleware Doctoral Symposium. ACM, 2019
en
dc.relation.hasversion
Chan, S. C., Cheney, J (2020). Flexible graph matching and graph edit distance using answer set programming. In International Symposium on Practical Aspects of Declarative Languages. Springer, Cham, 2020.
en
dc.relation.hasversion
Chan, S. C., Cheney, J., Gehani, A., & Irshad, H. (2020). Integrity checking and abnormality detection of provenance records. In Proceedings of the 12th USENIX Workshop on the Theory and Practice of Provenance. USENIX, 2020.
en
dc.subject
provenance
en
dc.subject
metadata
en
dc.subject
ProvMark
en
dc.subject
provenance tools
en
dc.title
Analysing system behaviour by automatic benchmarking of system-level provenance
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Chan2020.pdf
Size:
2.65 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)