Show simple item record

dc.contributor.advisorAtkinson, Malcolm
dc.contributor.advisorvan Hemert, Jano
dc.contributor.advisorCole, Murray
dc.contributor.advisorRodriguez Gonzalez, David
dc.contributor.advisorCarpenter, Trevor
dc.contributor.advisorWardlaw, Joanna
dc.contributor.authorLiew, Chee Sun
dc.date.accessioned2013-09-09T13:59:52Z
dc.date.available2013-09-09T13:59:52Z
dc.date.issued2012-11-29
dc.identifier.urihttp://hdl.handle.net/1842/7718
dc.description.abstractThe emergence of data-intensive science as the fourth science paradigm has posed a data deluge challenge for enacting scientific work-flows. The scientific community is facing an imminent flood of data from the next generation of experiments and simulations, besides dealing with the heterogeneity and complexity of data, applications and execution environments. New scientific work-flows involve execution on distributed and heterogeneous computing resources across organisational and geographical boundaries, processing gigabytes of live data streams and petabytes of archived and simulation data, in various formats and from multiple sources. Managing the enactment of such work-flows not only requires larger storage space and faster machines, but the capability to support scalability and diversity of the users, applications, data, computing resources and the enactment technologies. We argue that the enactment process can be made efficient using optimisation techniques in an appropriate architecture. This architecture should support the creation of diversified applications and their enactment on diversified execution environments, with a standard interface, i.e. a work-flow language. The work-flow language should be both human readable and suitable for communication between the enactment environments. The data-streaming model central to this architecture provides a scalable approach to large-scale data exploitation. Data-flow between computational elements in the scientific work-flow is implemented as streams. To cope with the exploratory nature of scientific work-flows, the architecture should support fast work-flow prototyping, and the re-use of work-flows and work-flow components. Above all, the enactment process should be easily repeated and automated. In this thesis, we present a candidate data-intensive architecture that includes an intermediate work-flow language, named DISPEL. We create a new fine-grained measurement framework to capture performance-related data during enactments, and design a performance database to organise them systematically. We propose a new enactment strategy to demonstrate that optimisation of data-streaming work-flows can be automated by exploiting performance data gathered during previous enactments.en_US
dc.language.isoenen_US
dc.publisherThe University of Edinburghen_US
dc.relation.hasversionChee Sun Liew, Amrey Krause, and David Snelling. Dispel enactment, in Malcolm P. Atkinson et al., The DATA Bonanza - Improving Knowledge Discovery for Science, Engineering and Business, Wiley,2012en_US
dc.relation.hasversionMalcolm P. Atkinson, Chee Sun Liew, Michelle Galea, Paul Martin, Amrey Krause, Adrian Mouat, Oscar Corcho, and David Snelling. Data-intensive ar- chitecture for scienti c knowledge discovery, Distributed and Parallel Databases, 30(5), 2012, pp. 307-324.en_US
dc.relation.hasversionChee Sun Liew, Malcolm P. Atkinson, Radoslaw Ostrowski, Murray Cole, Jano I. van Hemert and Liangxiu Han. Performance database: capturing data for op- timizing distributed streaming work ows, Philosophical Transactions of the Royal Society A, 369 (1949), 2011, pp. 3268-3284.en_US
dc.relation.hasversionLiangxiu Han, Chee Sun Liew, Malcolm P. Atkinson, and Jano I. van Hemert. A generic parallel processing model for facilitating data mining and integration, Parallel Computing, 37 (3), 2011, pp. 157-171.en_US
dc.relation.hasversionGagarine Yaikhom, Chee Sun Liew, Liangxiu Han, Jano van Hemert, Malcolm Atkinson, and Amy Krause. Federated enactment of work ow patterns, in Euro- Par 2010 - Parallel Processing, P. D'Ambra, M. Guarracino, and D. Talia, eds., vol. 6271 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2010, pp. 317-328.en_US
dc.relation.hasversionChee Sun Liew, Malcolm P. Atkinson, Jano I. van Hemert, and Liangxiu Han. Towards optimising distributed data streaming graphs using parallel streams, in HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, S. Hariri and K. Keahey, eds., New York, NY, USA, 2010, ACM, pp. 725-736.en_US
dc.relation.hasversionMalcolm P. Atkinson, Jano I. van Hemert, Liangxiu Han, Ally Hume, and Chee Sun Liew. A distributed architecture for data mining and integration, in DADC '09: Proceedings of the second international workshop on Data-aware distributed computing, ACM, 2009, pp. 11-20.en_US
dc.subjectoptimisationen_US
dc.subjectenactmenten_US
dc.subjectworkflowen_US
dc.subjectdata-intensiveen_US
dc.titleOptimisation of the enactment of fine-grained distributed data-intensive work flowsen_US
dc.typeThesis or Dissertationen_US
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhD Doctor of Philosophyen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record