Edinburgh Research Archive

Automatic performance optimisation of parallel programs for GPUs via rewrite rules

dc.contributor.advisor
Dubach, Christophe
en
dc.contributor.advisor
Steuwer, Michel
en
dc.contributor.author
Remmelg, Toomas
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2019-12-20T11:58:39Z
dc.date.available
2019-12-20T11:58:39Z
dc.date.issued
2019-12-11
dc.description.abstract
Graphics Processing Units (GPUs) are now commonplace in computing systems and are the most successful parallel accelerators. Their performance is orders of magnitude higher than traditional Central Processing Units (CPUs) making them attractive for many application domains with high computational demands. However, achieving their full performance potential is extremely hard, even for experienced programmers, as it requires specialised software tailored for specific devices written in low-level languages such as OpenCL. Differences in device characteristics between manufacturers and even hardware generations often lead to large performance variations when different optimisations are applied. This inevitably leads to code that is not performance portable across different hardware. This thesis demonstrates that achieving performance portability is possible using LIFT, a functional data-parallel language which allows programs to be expressed at a high-level in a hardware-agnostic way. The LIFT compiler is empowered to automatically explore the optimisation space using a set of well-defined rewrite rules to transform programs seamlessly between different high-level algorithmic forms before translating them to a low-level OpenCL-specific form. The first contribution of this thesis is the development of techniques to compile functional LIFT programs that have optimisations explicitly encoded into efficient imperative OpenCL code. Producing efficient code is non-trivial as many performance sensitive details such as memory allocation, array accesses or synchronisation are not explicitly represented in the functional LIFT language. The thesis shows that the newly developed techniques are essential for achieving performance on par with manually optimised code for GPU programs with the exact same complex optimisations applied. The second contribution of this thesis is the presentation of techniques that enable the LIFT compiler to perform complex optimisations that usually require from tens to hundreds of individual rule applications by grouping them as macro-rules that cut through the optimisation space. Using matrix multiplication as an example, starting from a single high-level program the compiler automatically generates highly optimised and specialised implementations for desktop and mobile GPUs with very different architectures achieving performance portability. The final contribution of this thesis is the demonstration of how low-level and GPU-specific features are extracted directly from the high-level functional LIFT program, enabling building a statistical performance model that makes accurate predictions about the performance of differently optimised program variants. This performance model is then used to drastically speed up the time taken by the optimisation space exploration by ranking the different variants based on their predicted performance. Overall, this thesis demonstrates that performance portability is achievable using LIFT.
en
dc.identifier.uri
https://hdl.handle.net/1842/36663
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, and Christophe Dubach. Performance Portable GPU Code Generation for Matrix Multiplication. In Proceedings of the 9th AnnualWorkshop on General Purpose Processing using Graphics Processing Unit (GPGPU ’16).
en
dc.relation.hasversion
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES ’16).
en
dc.relation.hasversion
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. LIFT: A Functional Data- Parallel IR for High-Performance GPU Code Generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO ’17).
en
dc.subject
Graphics Processing Units
en
dc.subject
GPUs
en
dc.subject
GPU programming
en
dc.subject
evolving architectures
en
dc.subject
LIFT
en
dc.subject
performance portability
en
dc.title
Automatic performance optimisation of parallel programs for GPUs via rewrite rules
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Remmelg2019.pdf
Size:
3.29 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)