dc.contributor.advisor | Viglas, Stratis | |
dc.contributor.advisor | Grot, Boris | |
dc.contributor.author | Pugh, Matthew | |
dc.date.accessioned | 2022-06-14T15:51:03Z | |
dc.date.available | 2022-06-14T15:51:03Z | |
dc.date.issued | 2022-06-14 | |
dc.identifier.uri | https://hdl.handle.net/1842/39102 | |
dc.identifier.uri | http://dx.doi.org/10.7488/era/2353 | |
dc.description.abstract | Key-value stores are ubiquitous at all layers of the computational stack; offering
constant average lookup, insertion, and deletion time. This dissertation looks
at how we can exploit skew in the workloads to improve the temporal locality
afforded in these systems, thereby resulting in an amortised throughput gain.
Using Hopscotch hashing as the basis of this approach, an analysis is performed, and model are developed to quantify the potential gains available by
performing in-memory data-reorganisation based on the number of accesses each
tuple accrues during the execution of a skewed workload. Skew is modelled by
drawing samples from numerous Zipf distributions, to measure the effect of
the reordering across a wide parameter set. A simulator is developed based on
this model to explore many random configurations of the table and quantify
the distribution of potential performance improvements. These predictions are
then tested experimentally by designing and implementing a number of novel
approaches, each with different trade-offs between complexity and optimal ordering. Results demonstrate up to 3.75x, with outlier up to almost 20x in
specific parameters, performance improvement on read-paths.
This work is then extended to a concurrent programming model, where the
baseline Hopscotch is lock-free. An analysis of the critical regions of execution
leads to further new algorithms that relax these constraints more, and altering
the design of prior approaches to achieve a trade-off of optimal ordering versus
performance. This is evaluated on numerous configurations of skewed read
workloads, resulting in a mean performance of approximately 2x when load-factor and number of queries are high, with up to 5x in high-opportunity,
low-probability configurations. A simple Symmetric Hash Join experiment is
also developed, demonstrating an approximate up to 1.5x improvement.
As a final extension, this dissertation explores what opportunities are available to perform adaptive policy selection at a fine-granularity. For cases where
there is little opportunity to benefit from the reordering, approaches including
insight-based rules-methods and reinforcement-learning inspired are proposed,
leading the way to future engineering work. | en |
dc.contributor.sponsor | Engineering and Physical Sciences Research Council (EPSRC) | en |
dc.language.iso | en | en |
dc.publisher | The University of Edinburgh | en |
dc.relation.hasversion | Matt A. Pugh. 2017. Spatio-Temporal Locality in Hash Tables. CEUR Workshop Proceedings 1882 (2017), 1–4. | en |
dc.subject | Hopscotch hashing | en |
dc.subject | in-memory data-reorganisation | en |
dc.subject | skew | en |
dc.subject | Zipf distribution | en |
dc.title | Hotscotch: exploiting workload characteristics to improve read-throughput | en |
dc.type | Thesis or Dissertation | en |
dc.type.qualificationlevel | Doctoral | en |
dc.type.qualificationname | PhD Doctor of Philosophy | en |