DCC Briefing Paper: Curating e-Science Data
Date
24/08/2006Author
Pennock, Maureen
Metadata
Abstract
The term 'e-Science' commonly refers to large-scale scientific collaborations carried out over the 'Grid', a technical architecture and infrastructure for co-ordinated and distributed sharing of data, resources and communications. e-Science methodologies increase capacity and capabilities, and are rapidly transforming not just science, but also medicine, engineering, and business; their effect is thus near-global. Typical and generic data types range from observational data, to large-scale experimental data, simulation, modelling, and design.
Curation of data collected and developed during these investigations is vital for post-analysis results verification, further experimentation and cumulative analysis. Yet despite its importance, usually only a very small percentage of outputs are properly managed and curated for re-use. Failure to properly curate means that investments are not maximised, research cannot be validated or reliably extended, and may even result in data loss and incorrect interpretation. Vigorous curation practices should be implemented to address these risks, ensure data provenance and integrity, and enable reliable re-use. The scale and importance of the research means a holistic and interoperable approach to curating research outputs is required. Ultimately, this is an issue that can only be addressed on a collaborative scale, like the Grid itself, and requires input from all stakeholders across the entire data life-cycle.