SPIKA: an energy-efficient hybrid CMOS-RRAM compute-in-memory macro for machine learning

Humood, Khaled

SPIKA: an energy-efficient hybrid CMOS-RRAM compute-in-memory macro for machine learning

Files

HumoodK_2025.pdf (35.5 MB)

Item Status

RESTRICTED ACCESS

Embargo End Date

2026-07-31

Date

2025-07-31

Authors

Humood, Khaled

Full item page

Abstract

The deployment of neural networks (NNs) in machine learning (ML) applications such as computer vision, speech recognition and natural language processing has grown exponentially in the past few decades. The biggest challenge in implementing such algorithms is the constant data movement between the compute units and memory units. Today’s computing systems, primarily built based on the von Neumann architecture where data must be moved to a processing unit, have shown inefficiency in implementing ML algorithms. The speed and energy associated with this bottleneck present a key performance concern for a range of applications in artificial intelligence (AI) workloads. Another key challenge is that NNs carry out copious calculations of Multiply and Accumulate (MAC) operations which require high-performance GPUs, consuming a great amount of power. Therefore, there is an important need to improve computing efficiency in terms of both energy and latency. Innovation in new computing architectures is expected to play a major role in the future of ML hardware. Non-volatile compute-in-memory (nvCIM) technology has recently shown promising results in addressing the data movement and multiply-and-accumulate (MAC) bottlenecks in machine learning algorithms by enabling parallel analogue vector-matrix multiplication (VMM) operations directly within memory arrays. By executing certain computational tasks within the memory itself, nvCIM provides an efficient alternative to traditional computing approaches. Specifically, nvCIM based on Resistive Random Access Memory (RRAM) has garnered attention due to its use of Ohm’s law for multiplication and Kirchhoff’s law for accumulation, allowing RRAM arrays to perform parallel in-memory MAC operations with significantly improved throughput and energy efficiency over digital computing methods. RRAM cells are used to carry weights of the neural network due to their low read voltage, ability to achieve multiple states per cell and dense structure. In this research work, I present SPIKA, a novel energy-efficient RRAM-nvCIM chip designed for accelerating machine learning workloads. The main aim of this PhD project is to accelerate ML and ANN applications at the maximum possible power efficiency. The key innovation of SPIKA lies in its ability to efficiently transfer input signals to output signals with minimal overhead. The analogue computation is performed in the time domain, with the dot product accumulated on a switched capacitor, eliminating the need for high-resolution, power-intensive data converters. Ultimately, the key pillar of SPIKA is that it leverages the low-resolution niche it addresses to allow each domain to play to its strengths whilst using simple and efficient domain converters. This makes for a highly functional and simultaneously energy and area-efficient implementation. The SPIKA chip has been fabricated using commercial 180nm technology and experimentally validated post-silicon. The core block features a 64x128 memory crossbar and utilizes 4-bit input, ternary weight, and 5-bit output resolutions. The results indicate a remarkable performance of SPIKA chip with a peak throughput of 1092 GOPS and energy efficiency of 195 TOPS/W. Compared to state-of-the-art solutions, the SPIKA core exhibits a significant energy efficiency improvement, ranging from 2.15x to 390x. For experimental demonstration, a neural network trained on the Modified National Institute of Standards and Technology (MNIST) database was implemented on the SPIKA chip. Results show a minimal 3\% loss in classification accuracy compared to the software baseline with 32-bit resolution, using the same network size and ternary quantized weights. Furthermore, an 8-core SPIKA system, each core featuring a 64×128 array, is proposed to extend the architecture. The system-level architecture introduces minimal overhead on the SPIKA core by incorporating a streamlined switching mechanism within each core, enabling efficient analogue aggregation and inter-core communication without the need for additional circuitry. Circuit-level simulations demonstrate the SPIKA system's superior performance, achieving a peak normalized throughput of 8.736 TOPS and energy efficiency of 312 TOPS/W, demonstrating competitive performance even against designs on more advanced technology nodes.

URI

https://hdl.handle.net/1842/43749
http://dx.doi.org/10.7488/era/6282

This item appears in the following Collection(s)

Engineering thesis and dissertation collection