Adaptive invalidate/update coherence protocol with accurate prediction for all-private cache hierarchies

Zhu, Mingcan

Adaptive invalidate/update coherence protocol with accurate prediction for all-private cache hierarchies

Simple item page

dc.contributor.advisor

Grot, Boris

dc.contributor.advisor

Nagarajan, Vijayanand

dc.contributor.author

Zhu, Mingcan

dc.date.accessioned

2025-02-04T14:22:38Z

dc.date.available

2025-02-04T14:22:38Z

dc.date.issued

2023-11-23

dc.description.abstract

The cache hierarchy is critical in today’s Chip Multiprocessors (CMPs), and Last- Level Cache (LLC) is of particular interest to a modern cache hierarchy. LLC typically accounts for the majority of a modern processor’s transistor budget and is essential for the system performance as it works as a frontier between on-chip and off-chip memory. LLC can typically reduce the number of cache misses that result in access to off-chip memory, which can cost more than 100 processor cycles. With data-driven workloads in a range of areas ranging from smartphones to data center servers, accommodating massive datasets for fast access has arisen as a particular pain issue. Today’s CPUs solve this issue by utilizing shared on-chip(die) LLCs. Shared on-chip LLCs play a key role in capturing the large working sets of today’s data-intensive workloads. However, they pose a fundamental scalability challenge in the transistor-limited post-Moore regime which means make scaling LLC capacity to meet the applications’ increasing working set size is impossible. Recent work has argued for Next Generation LLCs (NG-LLCs) based on private caches in die-stacked DRAM, which can provide hundreds of MBs of per-core LLC capacity at similar access latency to today’s shared LLCs. In this thesis, we study the performance impact of shared vs. private LLC organizations and observe that NG-LLCs are the winning configuration when used in high-capacity memory stacks vertically integrated on the CPU chip due to their massive per-core LLC capacity and fast access. While NG-LLCs offer a number of advantages, their private design exposes long-latency inter-core reads for read/write shared data, which hurts performance in parallel workloads. One way to eliminate the long latency of reads to read/write shared data is through the use of updating coherence protocols that eagerly push updates from a writer core into caches of recent readers. Alas, these protocols are known to generate excess cache and interconnect traffic that can be detrimental to overall performance. Some hybrid protocols that try to alleviate the problem by combining invalidating and updating protocols have also been proposed. In this thesis, we evaluate the performance of various coherence techniques and find that the performance benefits for NGLLCs are negligible. In this work, we observe that the number of writes to a read/write shared cache block tends to be stable over several consecutive write/read iterations. Based on this insight, we propose the 1-Update protocol that records the number of writes without an intervening read by a sharer and subsequently uses the recorded value to send at most one update after that number of writes has taken place. The 1-Update protocol has been formally verified, and its performance impact has been evaluated. Our experimental results show that 1-Update protocol can outperform all other protocols, and it achieves high efficacy in covering remote misses for read/write shared cache blocks while minimizing excess cache and interconnect traffic.

en

dc.identifier.uri

https://hdl.handle.net/1842/43066

dc.identifier.uri

http://dx.doi.org/10.7488/era/5612

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

M.Zhu, A.Shahab, A.Katsarakis and B.Grot. Invalidate or Update? Revisiting Coherence for Tomorrow’s Cache Hierarchies In 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2021 [65]

en

dc.relation.hasversion

A.Shahab, M.Zhu, A.Margaritov and B.Grot. Farewell My Shared LLC! A Case for Private Die-Stacked DRAM Caches for Servers. In Proceedings of the 51st International Symposium on Microarchitecture (MICRO). 2018 [52]

en

dc.subject

Chip Multiprocessors

en

dc.subject

Last- Level Cache

en

dc.subject

on-chip(die) LLCs

en

dc.subject

Shared on-chip LLCs

en

dc.subject

Next Generation LLCs

en

dc.subject

1-Update protocol

en

dc.title

Adaptive invalidate/update coherence protocol with accurate prediction for all-private cache hierarchies

en

dc.title.alternative

An adaptive invalidate/update coherence protocol with accurate prediction for all-private cache hierarchies

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Masters

en

dc.type.qualificationname

MPhil Master of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ZhuM_2023.pdf
Size:: 3.14 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection