Olivier Maher, N. Harnack, et al.
DRC 2023
Recent advancements in large language models (LLMs) have shifted the primary bottleneck of AI hardware from compute to memory capacity and data movement. Analog in-memory computing (AIMC) offers a promising path to address this challenge by enabling matrix-vector multiplication directly within memory arrays, significantly reducing data transfers associated with model weights. In this paper, we discuss the role of AIMC in LLM inference workloads from a holistic systems perspective. We analyze the architecture of modern LLMs and identify which operations are well-suited for AIMC. We further discuss key challenges and opportunities in memory technologies, algorithms, system architecture, and heterogeneous system composition that must be addressed to enable AIMC as a practical accelerator for future AI inference infrastructure.
Olivier Maher, N. Harnack, et al.
DRC 2023
Thomas Lesueur, David Danovitch, et al.
ECTC 2025
Max Bloomfield, Amogh Wasti, et al.
ITherm 2025
Victor Chan, A. Gasasira, et al.
IEEE Trans Semicond Manuf