Show simple item record

dc.contributor.advisorWeiland, Michele
dc.contributor.advisorFranke, Bjoern
dc.contributor.advisorSmith, Lorna
dc.contributor.authorZarins, Justs
dc.date.accessioned2021-12-13T14:14:13Z
dc.date.available2021-12-13T14:14:13Z
dc.date.issued2021-11-30
dc.identifier.urihttps://hdl.handle.net/1842/38342
dc.identifier.urihttp://dx.doi.org/10.7488/era/1607
dc.description.abstractMassively parallel supercomputers are susceptible to variable performance due to factors such as differences in chip manufacturing, heat management and network congestion. As a result, the same code with the same input can have a different execution time from run to run. Synchronisation under these circumstances is a key challenge that prevents applications from scaling to large problems and machines. Asynchronous algorithms offer a partial solution. In these algorithms fast processes are not forced to synchronise with slower ones. Instead, they continue computing updates, and moving towards the solution, using the latest data available to them, which may have become stale (i.e. the data is a number of iterations out of date compared to the most recent version). While this allows for high computational efficiency, the convergence rate of asynchronous algorithms tends to be lower than synchronous algorithms due to the use of stale values. A large degree of performance variability can eliminate the performance advantage of asynchronous algorithms or even cause the results to diverge. To address this problem, we use the unique properties of asynchronous algorithms to develop a load balancing strategy for iterative convergent asynchronous algorithms in both shared and distributed memory. The proposed approach – Progressive Load Balancing (PLB) – aims to balance progress levels over time, rather than attempting to equalise iteration rates across parallel workers. This approach attenuates noise without sacrificing performance, resulting in a significant reduction in progress imbalance and improving time to solution. The developed method is evaluated in a variety of scenarios using the asynchronous Jacobi algorithm. In shared memory, we show that it can essentially eliminate the negative effects of a single core in a node slowed down by 19%. Work stealing, an alternative load balancing approach, is shown to be ineffective. In distributed memory, the method reduces the impact of up to 8 slow nodes out of 15, each slowed down by 40%, resulting in 1.03×–1.10× reduction in time to solution and 1.11×–2.89× reduction in runtime variability. Furthermore, we successfully apply the method in a scenario with real faulty components running 75% slower than normal. Broader applicability of progressive load balancing is established by emulating its application to asynchronous stochastic gradient descent where it is found to improve both training time and the learned model’s accuracy. Overall, this thesis demonstrates that enhancing asynchronous algorithms with PLB is an effective method for tackling performance variability in supercomputers.en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionJusts Zarins and Michele Weiland. Progressive load balancing of asynchronous ` algorithms. In Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, IA3’17, pages 5:1–5:9. ACM, 2017.en
dc.relation.hasversionJusts Zarins and Michele Weiland. Progressive load balancing in distributed ` memory. In Proceedings of the International Conference on Parallel Computing, PARCO 2019, volume 36 of Advances in Parallel Computing, pages 127–136. IOS Press, 2019en
dc.subjectasynchronous applicationsen
dc.subjectasynchronous computationen
dc.subjectprogressive load balancingen
dc.subjectvariable speed mitigationen
dc.subjectefficiencyen
dc.subjectJacobi algorithmen
dc.titleProgressive load balancing of asynchronous algorithmsen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record