Hotscotch: exploiting workload characteristics to improve read-throughput
Item statusRestricted Access
Embargo end date14/06/2023
Key-value stores are ubiquitous at all layers of the computational stack; offering constant average lookup, insertion, and deletion time. This dissertation looks at how we can exploit skew in the workloads to improve the temporal locality afforded in these systems, thereby resulting in an amortised throughput gain. Using Hopscotch hashing as the basis of this approach, an analysis is performed, and model are developed to quantify the potential gains available by performing in-memory data-reorganisation based on the number of accesses each tuple accrues during the execution of a skewed workload. Skew is modelled by drawing samples from numerous Zipf distributions, to measure the effect of the reordering across a wide parameter set. A simulator is developed based on this model to explore many random configurations of the table and quantify the distribution of potential performance improvements. These predictions are then tested experimentally by designing and implementing a number of novel approaches, each with different trade-offs between complexity and optimal ordering. Results demonstrate up to 3.75x, with outlier up to almost 20x in specific parameters, performance improvement on read-paths. This work is then extended to a concurrent programming model, where the baseline Hopscotch is lock-free. An analysis of the critical regions of execution leads to further new algorithms that relax these constraints more, and altering the design of prior approaches to achieve a trade-off of optimal ordering versus performance. This is evaluated on numerous configurations of skewed read workloads, resulting in a mean performance of approximately 2x when load-factor and number of queries are high, with up to 5x in high-opportunity, low-probability configurations. A simple Symmetric Hash Join experiment is also developed, demonstrating an approximate up to 1.5x improvement. As a final extension, this dissertation explores what opportunities are available to perform adaptive policy selection at a fine-granularity. For cases where there is little opportunity to benefit from the reordering, approaches including insight-based rules-methods and reinforcement-learning inspired are proposed, leading the way to future engineering work.