Paper

EAGLE: A Flexible Heterogeneous Analog Compute-In-Memory Architecture With RISC-V Programmable Multi-Core Accelerators

Abstract

In today’s AI-driven landscape, designing hardware to accelerate deep learning inference faces challenges in meeting growing demands for speed, computational efficiency, and flexibility. Emerging technologies like Analog Compute-In-Memory (ACIM) hold significant promise for enhancing computational efficiency in these workloads, particularly for matrix-vector multiplications. However, architectural constraints hinder ACIM’s ability to support end-to-end workloads, necessitating complementary specialized digital units for essential operations like activation functions, pooling, and attention. These units are typically hardwired, lacking the adaptability needed in a rapidly evolving AI landscape. To address this challenge, we propose EAGLE, an ACIM-based architecture that integrates general-purpose Programmable Multi-Core Accelerators (PMCAs) as the sole digital accelerator, offering unprecedented flexibility and enabling seamless end-to-end execution across diverse workloads while maintaining competitive throughput. The architecture integrates state-of-the-art RISC-V Snitch clusters as PMCAs, extended with specialized instructions to enhance performance and energy efficiency. We introduce novel lightweight LUT-based ISA extensions to approximate commonly used transcendental functions and perform HW/SW co-optimization of key AI kernels, leveraging hardware extensions to eliminate control and memory instruction overhead. The flexibility of EAGLE is demonstrated across a range of model architectures, including encoder- and decoder-based transformers, a convolutional neural network, and a recurrent model (LSTM). Evaluated in 28 nm FD-SOI, EAGLE sustains 61.1/97.96 Inf/s on end-to-end MobileBERT/BERT-Large inference, achieving up to 3× energy efficiency improvements over state-of-the-art 8 nm GPUs.