Analog In-Memory Computing for Large Language Model Inference: Opportunities and Challenges

A. Vasilopoulos; Hadjer Benmeziane; Julian Büchel; William Simon; Abhairaj Singh; Irem Boybat-Kara; Jose Luquin; Pritish Narayanan; Sidney Tsai; Geoffrey Burr; Abbas Rahimi; Manuel Le Gallo; Vijay Narayanan; Abu Sebastian

doi:10.1109/IMW68301.2026.11532624

IMW 2026

Conference paper

10 May 2026

Analog In-Memory Computing for Large Language Model Inference: Opportunities and Challenges

View publication

Abstract

Recent advancements in large language models (LLMs) have shifted the primary bottleneck of AI hardware from compute to memory capacity and data movement. Analog in-memory computing (AIMC) offers a promising path to address this challenge by enabling matrix-vector multiplication directly within memory arrays, significantly reducing data transfers associated with model weights. In this paper, we discuss the role of AIMC in LLM inference workloads from a holistic systems perspective. We analyze the architecture of modern LLMs and identify which operations are well-suited for AIMC. We further discuss key challenges and opportunities in memory technologies, algorithms, system architecture, and heterogeneous system composition that must be addressed to enable AIMC as a practical accelerator for future AI inference infrastructure.

Conference paper