Edinburgh Research Archive

Profile-driven parallelisation of sequential programs

dc.contributor.advisor
Franke, Bjorn
en
dc.contributor.author
Tournavitis, Georgios
en
dc.date.accessioned
2011-09-07T13:19:58Z
dc.date.available
2011-09-07T13:19:58Z
dc.date.issued
2011-06-30
dc.description.abstract
Traditional parallelism detection in compilers is performed by means of static analysis and more specifically data and control dependence analysis. The information that is available at compile time, however, is inherently limited and therefore restricts the parallelisation opportunities. Furthermore, applications written in C – which represent the majority of today’s scientific, embedded and system software – utilise many lowlevel features and an intricate programming style that forces the compiler to even more conservative assumptions. Despite the numerous proposals to handle this uncertainty at compile time using speculative optimisation and parallelisation, the software industry still lacks any pragmatic approaches that extracts coarse-grain parallelism to exploit the multiple processing units of modern commodity hardware. This thesis introduces a novel approach for extracting and exploiting multiple forms of coarse-grain parallelism from sequential applications written in C. We utilise profiling information to overcome the limitations of static data and control-flow analysis enabling more aggressive parallelisation. Profiling is performed using an instrumentation scheme operating at the Intermediate Representation (Ir) level of the compiler. In contrast to existing approaches that depend on low-level binary tools and debugging information, Ir-profiling provides precise and direct correlation of profiling information back to the Ir structures of the compiler. Additionally, our approach is orthogonal to existing automatic parallelisation approaches and additional fine-grain parallelism may be exploited. We demonstrate the applicability and versatility of the proposed methodology using two studies that target different forms of parallelism. First, we focus on the exploitation of loop-level parallelism that is abundant in many scientific and embedded applications. We evaluate our parallelisation strategy against the Nas and Spec Fp benchmarks and two different multi-core platforms (a shared-memory Intel Xeon Smp and a heterogeneous distributed-memory Ibm Cell blade). Empirical evaluation shows that our approach not only yields significant improvements when compared with state-of- the-art parallelising compilers, but comes close to and sometimes exceeds the performance of manually parallelised codes. On average, our methodology achieves 96% of the performance of the hand-tuned parallel benchmarks on the Intel Xeon platform, and a significant speedup for the Cell platform. The second study, addresses the problem of partially sequential loops, typically found in implementations of multimedia codecs. We develop a more powerful whole-program representation based on the Program Dependence Graph (Pdg) that supports profiling, partitioning and codegeneration for pipeline parallelism. In addition we demonstrate how this enhances conventional pipeline parallelisation by incorporating support for multi-level loops and pipeline stage replication in a uniform and automatic way. Experimental results using a set of complex multimedia and stream processing benchmarks confirm the effectiveness of the proposed methodology that yields speedups up to 4.7 on a eight-core Intel Xeon machine.
en
dc.identifier.uri
http://hdl.handle.net/1842/5287
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Semi-Automatic Extraction and Exploitation of Hierarchical Pipeline Parallelism Using Profiling Information. Georgios Tournavitis and Bj¨orn Franke. In PACT’ 10: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pages 377–388, Vienna, Austria, 2010. ACM.
en
dc.relation.hasversion
Towards a Holistic Approach to Auto-parallelisation: Integrating Profile-driven Parallelism Detection and Machine-Learning Based Mapping. Georgios Tournavitis, Zheng Wang, Bj¨orn Franke and Michael F.P. O’Boyle. In PLDI ’09: Proceedings of the 2009 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 177–187, Dublin, Ireland, 2009. ACM.
en
dc.relation.hasversion
Towards Automatic Profile-Driven Parallelisation of Embedded Multimedia Applications. Georgios Tournavitis and Bj¨orn Franke. In MULTIPROG-2009: Proceedings of the Second Workshop on Programmability Issues for Multi-Core Computers, pages 53–64, Paphos, Cyprus, 2009.
en
dc.subject
compiler
en
dc.subject
multi-core
en
dc.subject
parallelisation
en
dc.subject
pipeline
en
dc.title
Profile-driven parallelisation of sequential programs
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Tournavitis2011.pdf
Size:
1.73 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)