Active provenance for data intensive research
dc.contributor.advisor
Atkinson, Malcolm
en
dc.contributor.advisor
Cheney, James
en
dc.contributor.advisor
Filgueira Vicente, Rosa
en
dc.contributor.author
Spinuso, Alessandro
en
dc.contributor.sponsor
other
en
dc.date.accessioned
2018-10-30T10:19:01Z
dc.date.available
2018-10-30T10:19:01Z
dc.date.issued
2018-11-29
dc.description.abstract
The role of provenance information in data-intensive research is a significant topic of
discussion among technical experts and scientists. Typical use cases addressing traceability,
versioning and reproducibility of the research findings are extended with more
interactive scenarios in support, for instance, of computational steering and results
management. In this thesis we investigate the impact that lineage records can have on
the early phases of the analysis, for instance performed through near-real-time systems
and Virtual Research Environments (VREs) tailored to the requirements of a specific
community. By positioning provenance at the centre of the computational research
cycle, we highlight the importance of having mechanisms at the data-scientists’ side
that, by integrating with the abstractions offered by the processing technologies, such
as scientific workflows and data-intensive tools, facilitate the experts’ contribution to
the lineage at runtime. Ultimately, by encouraging tuning and use of provenance for
rapid feedback, the thesis aims at improving the synergy between different user groups
to increase productivity and understanding of their processes.
We present a model of provenance, called S-PROV, that uses and further extends
PROV and ProvONE. The relationships and properties characterising the workflow’s
abstractions and their concrete executions are re-elaborated to include aspects related
to delegation, distribution and steering of stateful streaming operators. The model is
supported by the Active framework for tuneable and actionable lineage ensuring the
user’s engagement by fostering rapid exploitation. Here, concepts such as provenance
types, configuration and explicit state management allow users to capture complex
provenance scenarios and activate selective controls based on domain and user-defined
metadata. We outline how the traces are recorded in a new comprehensive system,
called S-ProvFlow, enabling different classes of consumers to explore the provenance
data with services and tools for monitoring, in-depth validation and comprehensive
visual-analytics. The work of this thesis will be discussed in the context of an existing
computational framework and the experience matured in implementing provenance-aware
tools for seismology and climate VREs. It will continue to evolve through
newly funded projects, thereby providing generic and user-centred solutions for data-intensive
research.
en
dc.identifier.uri
http://hdl.handle.net/1842/33181
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
A. Spinuso, J. Cheney, and M. Atkinson. Provenance for seismological processing pipelines in a distributed streaming workflow. Proceedings of the Joint EDBT/ICDT 2013 Workshops, 2013.
en
dc.relation.hasversion
S. Gesing, M. Atkinson, R. Filgueira, I. Taylor, A. Jones, V. Stankovski, C. S. Liew, A. Spinuso, G. Terstyanszky, and P. Kacsuk. Workflows in a dashboard: a new generation of usability. In Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science, pages 82-93. IEEE Press, 2014.
en
dc.relation.hasversion
R. Filgueira, A. Krause, M. Atkinson, I. Klampanos, A. Spinuso, and S. Sanchez- Exposito. dispel4py: An agile framework for data-intensive escience. In 11th IEEE International Conference on e-Science, pages 454-464. IEEE, 2015.
en
dc.relation.hasversion
M. Atkinson, M. Carpen´e, E. Casarotti, S. Claus, R. Filgueira, A. Frank, M. Galea, T. Garth, A. Gem¨u nd, H. Igel, I. Klampanos, A. Krause, L. Krischer, S. H. Leong, F. Magnoni, J. Matser, A. Michelini, A. Rietbrock, H. Schwichtenberg, A. Spinuso, and J. P. Vilotte. VERCE delivers a productive e-science environment for seismology research. In 2015 IEEE 11th International Conference on e-Science, pages 224-236, Aug 2015.
en
dc.relation.hasversion
T. Garth, A. Rietbrock, S. Hicks, A. Fuenzalida Velasco, E. Casarotti, and A. Spinuso. Full waveform modelling using the VERCE platform-application to aftershock seismicity in the chile subduction zone. In EGU General Assembly, Conference Abstracts, volume 17, 2015.
en
dc.relation.hasversion
A. Spinuso, R. Filgueira, M. Atkinson, and A. Gem¨und. Visualisation methods for large provenance collections in data-intensive collaborative platforms. In EGU General Assembly Conference Abstracts, volume 18, 2016.
en
dc.relation.hasversion
T. Kiss, P. Kacsuk, R. Lovas, A. Balask´o, A. Spinuso, M. Atkinson, D. D’Agostino, E. Danovaro, and M. Schiffers.WS-PGRADE/gUSE in European projects. In Kacsuk [162], pages 235-254.
en
dc.relation.hasversion
A. Mihajlovski, A. Spinuso, M. Plieger, andW. Som de Cerff. Enabling datadriven provenance in NetCDF, via OGC WPS operations. climate analysis services use case. In AGU Fall Meeting Abstracts, 2016.
en
dc.subject
informatics
en
dc.subject
data science
en
dc.subject
reproducibility
en
dc.subject
provenance
en
dc.subject
workflows
en
dc.subject
e-science
en
dc.title
Active provenance for data intensive research
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Spinuso2018.pdf
- Size:
- 38.49 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

