Conference paper

Analog In-Memory Computing for Large Language Model Inference: Opportunities and Challenges

Abstract

Recent advancements in large language models (LLMs) have shifted the primary bottleneck of AI hardware from compute to memory capacity and data movement. Analog in-memory computing (AIMC) offers a promising path to address this challenge by enabling matrix-vector multiplication directly within memory arrays, significantly reducing data transfers associated with model weights. In this paper, we discuss the role of AIMC in LLM inference workloads from a holistic systems perspective. We analyze the architecture of modern LLMs and identify which operations are well-suited for AIMC. We further discuss key challenges and opportunities in memory technologies, algorithms, system architecture, and heterogeneous system composition that must be addressed to enable AIMC as a practical accelerator for future AI inference infrastructure.