Show simple item record

dc.contributor.advisorViglas, Stratis
dc.contributor.advisorGrot, Boris
dc.contributor.authorPugh, Matthew
dc.date.accessioned2022-06-14T15:51:03Z
dc.date.available2022-06-14T15:51:03Z
dc.date.issued2022-06-14
dc.identifier.urihttps://hdl.handle.net/1842/39102
dc.identifier.urihttp://dx.doi.org/10.7488/era/2353
dc.description.abstractKey-value stores are ubiquitous at all layers of the computational stack; offering constant average lookup, insertion, and deletion time. This dissertation looks at how we can exploit skew in the workloads to improve the temporal locality afforded in these systems, thereby resulting in an amortised throughput gain. Using Hopscotch hashing as the basis of this approach, an analysis is performed, and model are developed to quantify the potential gains available by performing in-memory data-reorganisation based on the number of accesses each tuple accrues during the execution of a skewed workload. Skew is modelled by drawing samples from numerous Zipf distributions, to measure the effect of the reordering across a wide parameter set. A simulator is developed based on this model to explore many random configurations of the table and quantify the distribution of potential performance improvements. These predictions are then tested experimentally by designing and implementing a number of novel approaches, each with different trade-offs between complexity and optimal ordering. Results demonstrate up to 3.75x, with outlier up to almost 20x in specific parameters, performance improvement on read-paths. This work is then extended to a concurrent programming model, where the baseline Hopscotch is lock-free. An analysis of the critical regions of execution leads to further new algorithms that relax these constraints more, and altering the design of prior approaches to achieve a trade-off of optimal ordering versus performance. This is evaluated on numerous configurations of skewed read workloads, resulting in a mean performance of approximately 2x when load-factor and number of queries are high, with up to 5x in high-opportunity, low-probability configurations. A simple Symmetric Hash Join experiment is also developed, demonstrating an approximate up to 1.5x improvement. As a final extension, this dissertation explores what opportunities are available to perform adaptive policy selection at a fine-granularity. For cases where there is little opportunity to benefit from the reordering, approaches including insight-based rules-methods and reinforcement-learning inspired are proposed, leading the way to future engineering work.en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.language.isoenen
dc.publisherThe University of Edinburghen
dc.relation.hasversionMatt A. Pugh. 2017. Spatio-Temporal Locality in Hash Tables. CEUR Workshop Proceedings 1882 (2017), 1–4.en
dc.subjectHopscotch hashingen
dc.subjectin-memory data-reorganisationen
dc.subjectskewen
dc.subjectZipf distributionen
dc.titleHotscotch: exploiting workload characteristics to improve read-throughputen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record