Progressive load balancing of asynchronous algorithms
dc.contributor.advisor
Weiland, Michele
dc.contributor.advisor
Franke, Bjoern
dc.contributor.advisor
Smith, Lorna
dc.contributor.author
Zarins, Justs
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2021-12-13T14:14:13Z
dc.date.available
2021-12-13T14:14:13Z
dc.date.issued
2021-11-30
dc.description.abstract
Massively parallel supercomputers are susceptible to variable performance due to
factors such as differences in chip manufacturing, heat management and network congestion. As a result, the same code with the same input can have a different execution
time from run to run. Synchronisation under these circumstances is a key challenge
that prevents applications from scaling to large problems and machines.
Asynchronous algorithms offer a partial solution. In these algorithms fast processes
are not forced to synchronise with slower ones. Instead, they continue computing updates, and moving towards the solution, using the latest data available to them, which
may have become stale (i.e. the data is a number of iterations out of date compared
to the most recent version). While this allows for high computational efficiency, the
convergence rate of asynchronous algorithms tends to be lower than synchronous algorithms due to the use of stale values. A large degree of performance variability can
eliminate the performance advantage of asynchronous algorithms or even cause the
results to diverge.
To address this problem, we use the unique properties of asynchronous algorithms
to develop a load balancing strategy for iterative convergent asynchronous algorithms
in both shared and distributed memory. The proposed approach – Progressive Load
Balancing (PLB) – aims to balance progress levels over time, rather than attempting to
equalise iteration rates across parallel workers. This approach attenuates noise without
sacrificing performance, resulting in a significant reduction in progress imbalance and
improving time to solution.
The developed method is evaluated in a variety of scenarios using the asynchronous
Jacobi algorithm. In shared memory, we show that it can essentially eliminate the
negative effects of a single core in a node slowed down by 19%. Work stealing, an
alternative load balancing approach, is shown to be ineffective. In distributed memory,
the method reduces the impact of up to 8 slow nodes out of 15, each slowed down
by 40%, resulting in 1.03×–1.10× reduction in time to solution and 1.11×–2.89×
reduction in runtime variability. Furthermore, we successfully apply the method in
a scenario with real faulty components running 75% slower than normal. Broader
applicability of progressive load balancing is established by emulating its application
to asynchronous stochastic gradient descent where it is found to improve both training
time and the learned model’s accuracy.
Overall, this thesis demonstrates that enhancing asynchronous algorithms with
PLB is an effective method for tackling performance variability in supercomputers.
en
dc.identifier.uri
https://hdl.handle.net/1842/38342
dc.identifier.uri
http://dx.doi.org/10.7488/era/1607
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Justs Zarins and Michele Weiland. Progressive load balancing of asynchronous ` algorithms. In Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, IA3’17, pages 5:1–5:9. ACM, 2017.
en
dc.relation.hasversion
Justs Zarins and Michele Weiland. Progressive load balancing in distributed ` memory. In Proceedings of the International Conference on Parallel Computing, PARCO 2019, volume 36 of Advances in Parallel Computing, pages 127–136. IOS Press, 2019
en
dc.subject
asynchronous applications
en
dc.subject
asynchronous computation
en
dc.subject
progressive load balancing
en
dc.subject
variable speed mitigation
en
dc.subject
efficiency
en
dc.subject
Jacobi algorithm
en
dc.title
Progressive load balancing of asynchronous algorithms
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Zarins2021.pdf
- Size:
- 2.3 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

