Edinburgh Research Archive

Alleviating the software parallelization task

Item Status

Embargo End Date

Authors

Maramzin, Aleksandr

Abstract

Despite decades of research into parallelizing compiler technology, software parallelization remains a largely manual task, which is complex, time-consuming, and error-prone. An embarrassingly parallel problem can be hidden behind a serial algorithm, thoughtless software design, or unsuccessfully chosen lower-level constructs, such as data structures. To elegantly and effectively map a parallel problem onto the exact hardware a programmer must possess expert-level knowledge in various fields from software design and algorithmic patterns down to automatic vectorization and cache coherence. In this thesis, we do not strive to find a ”silver bullet” and solve the problem of automatic parallelization. Neither do we expect an average programmer to be an expert. Instead, we acknowledge the role a programmer plays in the parallelization process and equip the former with an assistant solution. Our solution alleviates the task and makes parallelism more accessible to an average programmer. The assistant solution consists of a tool and a library aiming at different stages of software parallelization. The tool aims at finer granularity levels. Program loops are often the richest source of parallelism and account for the biggest portion of the running time. The tool identifies those loops, which are both worthwhile and feasible to parallelize. For each loop, the tool combines its potential contribution to speedup and an estimated probability for its successful parallelization. This probability is predicted using a machine learning model, which has been trained and tested on 1415 labelled loops, achieving a prediction accuracy greater than 90%. We present a methodology that makes better use of expert time by guiding them directly towards those loops, where the largest performance gains can be expected while keeping analysis and transformation effort at a minimum. We have evaluated our parallelization assistant against sequential C applications from the SNU NAS benchmark suite. We show that our novel methodology achieves parallel performance levels comparable to those from expert programmers while requiring less expert time. On average, our assistant reduces the number of lines of code that have to be inspected manually before reaching expertlevel parallel speedup by 20%. The library implements the novel idea of computational frameworks, which are higher-level entities that embody both data structures and algorithms. The use of computational frameworks as parallel software design primitives alleviates the process of parallel software development for a wide class of applications. We prototyped the library on the Olden benchmark suite. The parallel library version consistently outperforms the sequential version hitting 5-6x speedups on the major benchmarks.

This item appears in the following Collection(s)