Edinburgh Research Archive

Guided rewriting and constraint satisfaction for parallel GPU code generation

dc.contributor.advisor
O'Boyle, Michael
dc.contributor.advisor
Dubach, Christophe
dc.contributor.advisor
Heafield, Kenneth
dc.contributor.author
Mogers, Naums
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2023-07-28T12:15:34Z
dc.date.available
2023-07-28T12:15:34Z
dc.date.issued
2023-07-28
dc.description.abstract
Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation.
en
dc.identifier.uri
https://hdl.handle.net/1842/40832
dc.identifier.uri
http://dx.doi.org/10.7488/era/3587
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Naums Mogers, Valentin Radu, Lu Li, Jack Turner, Michael O’Boyle, and Christophe Dubach. “Automatic generation of specialized direct convolutions for mobile GPUs”. In: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit. 2020, pages 41–50. DOI: 10.1145/ 3366428.3380771. T
en
dc.relation.hasversion
Naums Mogers, Lu Li, Valentin Radu, and Christophe Dubach. “Mapping parallelism in a functional IR through constraint satisfaction: a case study on convolution for mobile GPUs”. In: Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction. 2022, pages 218–230.
en
dc.subject
compiler optimization
en
dc.subject
convolutional neural networks
en
dc.subject
GPU computing
en
dc.subject
parallel programming
en
dc.subject
functional programming
en
dc.title
Guided rewriting and constraint satisfaction for parallel GPU code generation
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Mogers2023.pdf
Size:
9.25 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)