Cooperative auto-tuning of parallel skeletons
Collins, Alexander James
Improving program performance through the use of multiple homogeneous processing elements, or cores, is common-place. However, these architectures increase the complexity required at the software level. Existing work is focused on optimising programs that run in isolation on these systems, but ignores the fact that, in reality, these systems run multiple parallel programs concurrently with programs competing for system resources. In order to improve performance in this shared environment, cooperative tuning of multiple, concurrently running parallel programs is required. Moreover, the set of programs running on the system – the system workload – is dynamic and rapidly changing. This makes cooperative tuning a challenge, as it must react rapidly to changes in the system workload. This thesis explores the scope for performance improvement from cooperatively tuning skeleton parallel programs, and techniques that can be used to cooperatively auto-tune parallel programs. Parallel skeletons provide a clear separation between algorithm description and implementation, and provide tuning knobs that the system can use to make high-level changes to a programs implementation. This work is in three parts: (i) how many threads should be allocated to each program running on the system, (ii) on which cores should a programs threads be executed and (iii) what values should be chosen for high-level parameters of the parallel skeletons. We demonstrate that significant performance improvements are available in each of these areas, compared to the current state-of-the-art.
The following license files are associated with this item: