Automatic performance optimisation of parallel programs for GPUs via rewrite rules
dc.contributor.advisor
Dubach, Christophe
en
dc.contributor.advisor
Steuwer, Michel
en
dc.contributor.author
Remmelg, Toomas
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2019-12-20T11:58:39Z
dc.date.available
2019-12-20T11:58:39Z
dc.date.issued
2019-12-11
dc.description.abstract
Graphics Processing Units (GPUs) are now commonplace in computing systems and are the
most successful parallel accelerators. Their performance is orders of magnitude higher than
traditional Central Processing Units (CPUs) making them attractive for many application domains
with high computational demands. However, achieving their full performance potential
is extremely hard, even for experienced programmers, as it requires specialised software tailored
for specific devices written in low-level languages such as OpenCL. Differences in device
characteristics between manufacturers and even hardware generations often lead to large performance
variations when different optimisations are applied. This inevitably leads to code that
is not performance portable across different hardware.
This thesis demonstrates that achieving performance portability is possible using LIFT, a
functional data-parallel language which allows programs to be expressed at a high-level in a
hardware-agnostic way. The LIFT compiler is empowered to automatically explore the optimisation
space using a set of well-defined rewrite rules to transform programs seamlessly between
different high-level algorithmic forms before translating them to a low-level OpenCL-specific
form.
The first contribution of this thesis is the development of techniques to compile functional
LIFT programs that have optimisations explicitly encoded into efficient imperative OpenCL
code. Producing efficient code is non-trivial as many performance sensitive details such as
memory allocation, array accesses or synchronisation are not explicitly represented in the functional
LIFT language. The thesis shows that the newly developed techniques are essential for
achieving performance on par with manually optimised code for GPU programs with the exact
same complex optimisations applied.
The second contribution of this thesis is the presentation of techniques that enable the
LIFT compiler to perform complex optimisations that usually require from tens to hundreds of
individual rule applications by grouping them as macro-rules that cut through the optimisation
space. Using matrix multiplication as an example, starting from a single high-level program
the compiler automatically generates highly optimised and specialised implementations for
desktop and mobile GPUs with very different architectures achieving performance portability.
The final contribution of this thesis is the demonstration of how low-level and GPU-specific
features are extracted directly from the high-level functional LIFT program, enabling building
a statistical performance model that makes accurate predictions about the performance of differently
optimised program variants. This performance model is then used to drastically speed
up the time taken by the optimisation space exploration by ranking the different variants based
on their predicted performance.
Overall, this thesis demonstrates that performance portability is achievable using LIFT.
en
dc.identifier.uri
https://hdl.handle.net/1842/36663
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, and Christophe Dubach. Performance Portable GPU Code Generation for Matrix Multiplication. In Proceedings of the 9th AnnualWorkshop on General Purpose Processing using Graphics Processing Unit (GPGPU ’16).
en
dc.relation.hasversion
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES ’16).
en
dc.relation.hasversion
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. LIFT: A Functional Data- Parallel IR for High-Performance GPU Code Generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO ’17).
en
dc.subject
Graphics Processing Units
en
dc.subject
GPUs
en
dc.subject
GPU programming
en
dc.subject
evolving architectures
en
dc.subject
LIFT
en
dc.subject
performance portability
en
dc.title
Automatic performance optimisation of parallel programs for GPUs via rewrite rules
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Remmelg2019.pdf
- Size:
- 3.29 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

