Efficient transformer adaptation for analog in-memory computing using low-rank adapters

Chen Li; Elena Ferro; Corey Liam Lammie; Manuel Le Gallo; Irem Boybat-Kara; Bipin Rajendran

doi:10.1088/2634-4386/ae405e

Neuromorph. Comput. Eng.

Paper

02 Feb 2026

Efficient transformer adaptation for analog in-memory computing using low-rank adapters

View publication

Abstract

Analog In-Memory Computing (AIMC) offers a promising solution to the von Neumann bottleneck. However, deploying transformer models on AIMC remains challenging due to their inherent need for flexibility and adaptability across diverse tasks. To address this, we propose Analog Hardware-Aware Low-Rank Adaptation (AHWA-LoRA) training, a novel approach to efficiently adapt transformers for AIMC hardware. Unlike conventional AHWA training that retrains the entire model, AHWA-LoRA training keeps the analog weights as fixed, meta-weights. These are adapted using lightweight, external LoRA modules. We validate AHWA-LoRA training on SQuADv1.1 and the GLUE benchmark, demonstrate its scalability to larger models (e.g., BERT-Large, LLaMA), and show its effectiveness in instruction tuning and reinforcement learning. We also evaluate a practical deployment scenario that balances AIMC tile latency with digital LoRA processing using optimized pipeline strategies, with RISC-V-based programmable multi-core accelerators. This hybrid architecture achieves efficient transformer inference with only a 4% per-layer overhead compared to a fully AIMC implementation.

Talk