Edinburgh Research Archive

Zero-cost abstractions for irregular data shapes in a high-performance parallel language

dc.contributor.advisor
Smith, Lorna
dc.contributor.advisor
Franke, Bjoern
dc.contributor.advisor
dubach, Christophe
dc.contributor.author
Pizzuti, Federico
dc.date.accessioned
2023-02-15T19:29:34Z
dc.date.available
2023-02-15T19:29:34Z
dc.date.issued
2023-02-15
dc.description.abstract
Modern parallel accelerators offer an unprecedented degree of performance, and are used pervasively in important application domains, such as High Performance Computing and Machine Learning. However, these accelerators are challenging to program. The primary option is using platform-specific low-level languages (eg. OpenCL, CUDA), but this requires a significant level of expertise to be used effectively. The other option is to rely on domain-specific libraries such as OpenBLAS and cuSparse. However, they cater to common use cases, and are sub-optimal for less popular architectures or applications. Data-Parallel Functional Languages such as Lift, Accelerate and Futhark facilitate the programming of parallel accelerators, offering a solution to these issues. They have extensive support for multi-dimensional array programming and excel in expressing parallel programs over regularly-shaped data structures. Raising the level of abstraction allows users to express the application more naturally, while the constrained nature of array operators enables the generation of high-performance code. However, many applications require irregularly shaped data, such as sparse or otherwise non-rectangular matrices. Expressing these structures in terms of dense multidimensional array is challenging. This thesis introduces the minimal number of constructs necessary to express such structures. Rather than add a completely new programming paradigm, it extends the array programming model, allowing for maximal composability and uniformity between the new and existing abstractions. The central language extension is the position-dependent arrays: heterogeneous multi-dimensional arrays whose element’s type is statically dependent on its position within the data structure. The construct is then progressively generalised, allowing the dependence to be not only static but also dynamic, enabling programs using sparse matrices. These abstractions are zero-cost, which is key to producing fast code. This is achieved by using dependent types to enforce complex invariants at compile time. As programs using sparse matrices often need a fast prefix-sum, the thesis also includes a novel implementation of work-efficient parallel scan for data-parallel languages. A comparison with state-of-the-art alternatives shows the effectiveness of these contributions. The compiler generates programs that are competitive or outright outperform those found in libraries such as OpenBlas, cuBlas and cuSparse. Examples include an average 1.2x speedup for sparse matrix-vector multiplication vs cuSparse’s implementation on a NVidia GTX 1070 GPU, and a 1.5x speedup for prefix sum vs a competitive handwritten implementation.
en
dc.identifier.uri
https://hdl.handle.net/1842/39856
dc.identifier.uri
http://dx.doi.org/10.7488/era/3104
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Federico Pizzuti, Michel Steuwer, and Christophe Dubach. 2021. Generating high performance code for irregular data structures using dependent types. In Proceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC 2021). Association for Computing Machinery, New York, NY, USA, 37–49. https://doi.org/10.1145/3471873.3472977
en
dc.relation.hasversion
Federico Pizzuti, Michel Steuwer, and Christophe Dubach. 2020. Generating fast sparse matrix vector multiplication from a high level generic functional IR. In Proceedings of the 29th International Conference on Compiler Construction (CC 2020). Association for Computing Machinery, New York, NY, USA, 85–95. https://doi.org/10.1145/3377555.3377896
en
dc.relation.hasversion
Federico Pizzuti, Michel Steuwer, and Christophe Dubach. 2021. Generating high performance code for irregular data structures using dependent types. In Proceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC 2021). Association for Computing Machinery, New York, NY, USA, 37–49. https://doi.org/10.1145/3471873.3472977
en
dc.relation.hasversion
izzuti, F., Steuwer, M., Dubach, C. (2022). Generating Work Efficient Scan Implementations for GPUs the Functional Way. In: Cano, J., Trinder, P. (eds) Euro-Par 2022: Parallel Processing. Euro-Par 2022. Lecture Notes in Computer Science, vol 13440. Springer, Cham. https://doi.org/10.1007/978-3-031-12597-3_21
en
dc.subject
Parallel Programming
en
dc.subject
Functional Programming
en
dc.subject
Sparse Data Structures
en
dc.subject
Sparse Data Structures
en
dc.subject
Sparse Data Structures
en
dc.title
Zero-cost abstractions for irregular data shapes in a high-performance parallel language
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
PizzutiF_2023.pdf
Size:
1.24 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)