Is Finer Better? The Limits of Microscaling Formats in Large Language ModelsAndrea FasoliMonodeep Karet al.2026ICLR 2026Conference paper
Advancing Fluorescence Light Detection and Ranging in Scattering Media with a Physics-Guided Mixture-of-Experts and Evidential CriticsIsmail ErbasFerhat Demikiranet al.2025NeurIPS 2025Workshop paper
Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory SystemYunhua FangRui Xieet al.2025IEEE Computer Architecture LettersPaper
CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA InitializationYanxia DengAozhong Zhanget al.2025TMLRPaper
Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex OptimizationWei LiuAnweshit Pandaet al.2025TMLRPaper
COMQ: A Backpropagation-Free Algorithm for Post-Training QuantizationAozhong ZhangZi Yanget al.2025IEEE AccessPaper
Generative AI Through CAS Lens: An Integrated Overview of Algorithmic Optimizations, Architectural Advances, and Automated DesignsChuan ZhangYou Youet al.2025IEEE JESTCSPaper
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime ImagingIsmail ErbasVikas Pandeyet al.2024NeurIPS 2024Workshop paper
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-ExpertsMohammed Nowaz Rabbani ChowdhuryMeng Wanget al.2024ICML 2024Conference paper
Improved Techniques for Quantizing Deep Networks with Adaptive Bit-WidthsXimeng SunRameswar Pandaet al.2024WACV 2024Conference paper
03 Mar 2025US12240753Micro-electromechanical Device Having A Soft Magnetic Material Electrolessly Deposited On A Palladium Layer Coated Metal Beam
23 Dec 2024US12175359Machine Learning Hardware Having Reduced Precision parameter Components For Efficient Parameter Update
21 Jul 2024JP7525237Machine Learning Hardware Having Reduced Precision Parameter Components For Efficient Parameter Update
PCPin-Yu ChenPrincipal Research Scientist and Manager; Chief Scientist, RPI-IBM AI Research Collaboration