Harshit Kumar, Pranjal Gupta, et al.
ICPE 2025
Analog In-Memory Computing (AIMC) offers a promising solution to the von Neumann bottleneck. However, deploying transformer models on AIMC remains challenging due to their inherent need for flexibility and adaptability across diverse tasks. To address this, we propose Analog Hardware-Aware Low-Rank Adaptation (AHWA-LoRA) training, a novel approach to efficiently adapt transformers for AIMC hardware. Unlike conventional AHWA training that retrains the entire model, AHWA-LoRA training keeps the analog weights as fixed, meta-weights. These are adapted using lightweight, external LoRA modules. We validate AHWA-LoRA training on SQuADv1.1 and the GLUE benchmark, demonstrate its scalability to larger models (e.g., BERT-Large, LLaMA), and show its effectiveness in instruction tuning and reinforcement learning. We also evaluate a practical deployment scenario that balances AIMC tile latency with digital LoRA processing using optimized pipeline strategies, with RISC-V-based programmable multi-core accelerators. This hybrid architecture achieves efficient transformer inference with only a 4% per-layer overhead compared to a fully AIMC implementation.
Harshit Kumar, Pranjal Gupta, et al.
ICPE 2025
Dominik Metzler
PESM 2023
Corey Lammie, Julian Büchel, et al.
Nature Communications
Mark Lantz
PAS-SEMINAARI 2023