Automated detection of structured coarse-grained parallelism in sequential legacy applications

Edler Von Koch, Tobias Joseph Kastulus

Automated detection of structured coarse-grained parallelism in sequential legacy applications

Simple item page

dc.contributor.advisor

Franke, Bjoern

en

dc.contributor.advisor

Garcia, Frankie

en

dc.contributor.advisor

Singer, Jeremy

en

dc.contributor.author

Edler Von Koch, Tobias Joseph Kastulus

en

dc.contributor.sponsor

other

en

dc.date.accessioned

2015-02-26T15:04:38Z

dc.date.available

2015-02-26T15:04:38Z

dc.date.issued

2014-11-27

dc.description.abstract

The efficient execution of sequential legacy applications on modern, parallel computer architectures is one of today’s most pressing problems. Automatic parallelization has been investigated as a potential solution for several decades but its success generally remains restricted to small niches of regular, array-based applications. This thesis investigates two techniques that have the potential to overcome these limitations. Beginning at the lowest level of abstraction, the binary executable, it presents a study of the limits of Dynamic Binary Parallelization (Dbp), a recently proposed technique that takes advantage of an underlying multicore host to transparently parallelize a sequential binary executable. While still in its infancy, Dbp has received broad interest within the research community. This thesis seeks to gain an understanding of the factors contributing to the limits of Dbp and the costs and overheads of its implementation. An extensive evaluation using a parameterizable Dbp system targeting a Cmp with light-weight architectural Tls support is presented. The results show that there is room for a significant reduction of up to 54% in the number of instructions on the critical paths of legacy Spec Cpu2006 benchmarks, but that it is much harder to translate these savings into actual performance improvements, with a realistic hardware-supported implementation achieving a speedup of 1.09 on average. While automatically parallelizing compilers have traditionally focused on data parallelism, additional parallelism exists in a plethora of other shapes such as task farms, divide & conquer, map/reduce and many more. These algorithmic skeletons, i.e. high-level abstractions for commonly used patterns of parallel computation, differ substantially from data parallel loops. Unfortunately, algorithmic skeletons are largely informal programming abstractions and are lacking a formal characterization in terms of established compiler concepts. This thesis develops compiler-friendly characterizations of popular algorithmic skeletons using a novel notion of commutativity based on liveness. A hybrid static/dynamic analysis framework for the context-sensitive detection of skeletons in legacy code that overcomes limitations of static analysis by complementing it with profiling information is described. A proof-of-concept implementation of this framework in the Llvm compiler infrastructure is evaluated against Spec Cpu2006 benchmarks for the detection of a typical skeleton. The results illustrate that skeletons are often context-sensitive in nature. Like the two approaches presented in this thesis, many dynamic parallelization techniques exploit the fact that some statically detected data and control flow dependences do not manifest themselves in every possible program execution (may-dependences) but occur only infrequently, e.g. for some corner cases, or not at all for any legal program input. While the effectiveness of dynamic parallelization techniques critically depends on the absence of such dependences, not much is known about their nature. This thesis presents an empirical analysis and characterization of the variability of both data dependences and control flow across program runs. The cBench benchmark suite is run with 100 randomly chosen input data sets to generate whole-program control and data flow graphs (Cdfgs) for each run, which are then compared to obtain a measure of the variance in the observed control and data flow. The results show that, on average, the cumulative profile information gathered with at least 55, and up to 100, different input data sets is needed to achieve full coverage of the data flow observed across all runs. For control flow, the figure stands at 46 and 100 data sets, respectively. This suggests that profile-guided parallelization needs to be applied with utmost care, as misclassification of sequential loops as parallel was observed even when up to 94 input data sets are used.

en

dc.identifier.uri

http://hdl.handle.net/1842/9976

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Tobias J.K. Edler von Koch, Björn Franke, Pranav Bhandarkar, and Anshuman Dasgupta. “Exploiting Function Similarity for Code Size Reduction.” In: Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’14), Edinburgh, UK, June 2014.

en

dc.relation.hasversion

Tobias J.K. Edler von Koch and Björn Franke. “Variability of Data Dependences and Control Flow.” In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14), Monterey, CA, March 2014

en

dc.relation.hasversion

Tobias J.K. Edler von Koch and Björn Franke. “Limits of Region-Based Dynamic Binary Parallelization.” In: Proceedings of the 9th Annual ACM SIGPLAN/ SIGOPS International Conference on Virtual Execution Environments (VEE’13), Houston, TX, March 2013

en

dc.relation.hasversion

Oscar Almer, Igor Böhm, Tobias J.K. Edler von Koch, Björn Franke, Stephen Kyle, Volker Seeker, Christopher Thompson, and Nigel Topham. “Scalable Multi-Core Simulation Using Parallel Dynamic Binary Translation.” In: Proceedings of the 11th IEEE International Conference on Embedded Computer Systems: Architectures, Modelling, and Simulation (SAMOS’11), Samos, Greece, July 2011

en

dc.relation.hasversion

Igor Böhm, Tobias J.K. Edler von Koch, Stephen Kyle, Björn Franke and Nigel Topham. “Generalized Just-In-Time Trace Compilation using a Parallel Task Farm in a Dynamic Binary Translator.” In: Proceedings of the ACM SIGPLAN 2011 Conference on Programming Language Design and Implementation (PLDI’11), San Jose, CA, June 2011

en

dc.relation.hasversion

Tobias J.K. Edler von Koch, Igor Böhm, and Björn Franke. “Integrated Instruction Selection and Register Allocation for Compact Code Generation Exploiting Freeform Mixing of 16- and 32-bit Instructions.” In: Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’10), Toronto, Canada, 2010.

en

dc.subject

compilers

en

dc.subject

automatic parallelization

en

dc.subject

multicore

en

dc.subject

skeletons

en

dc.subject

programming

en

dc.title

Automated detection of structured coarse-grained parallelism in sequential legacy applications

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Edler Von Koch2014.pdf
Size:: 2.8 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection