A Multiscale Workflow for Thermal Analysis of 3DI Chip Stacks
Max Bloomfield, Amogh Wasti, et al.
ITherm 2025
Analog In-Memory Computing (AIMC) offers a promising solution to the von Neumann bottleneck. However, deploying transformer models on AIMC remains challenging due to their inherent need for flexibility and adaptability across diverse tasks. To address this, we propose Analog Hardware-Aware Low-Rank Adaptation (AHWA-LoRA) training, a novel approach to efficiently adapt transformers for AIMC hardware. Unlike conventional AHWA training that retrains the entire model, AHWA-LoRA training keeps the analog weights as fixed, meta-weights. These are adapted using lightweight, external LoRA modules. We validate AHWA-LoRA training on SQuADv1.1 and the GLUE benchmark, demonstrate its scalability to larger models (e.g., BERT-Large, LLaMA), and show its effectiveness in instruction tuning and reinforcement learning. We also evaluate a practical deployment scenario that balances AIMC tile latency with digital LoRA processing using optimized pipeline strategies, with RISC-V-based programmable multi-core accelerators. This hybrid architecture achieves efficient transformer inference with only a 4% per-layer overhead compared to a fully AIMC implementation.
Max Bloomfield, Amogh Wasti, et al.
ITherm 2025
Rajiv Joshi, John Davis, et al.
VLSI Technology and Circuits 2025
Evaline Ju, Kelly Abuelsaad
KubeCon EU 2026
Runqian Wang, Soumya Ghosh, et al.
NeurIPS 2024