Is Finer Better? The Limits of Microscaling Formats in Large Language ModelsAndrea FasoliMonodeep Karet al.2026ICLR 2026Conference paper
Eliminating Redundancy: Ultra-compact Code Generation for Programmable Dataflow AcceleratorsPrasanth ChatarasiAlex Gateaet al.2026CGO 2026Conference paper
Spyre: An inference-optimized scalable AI accelerator for enterprise workloadsMatt CohenMonodeep Karet al.2026ISSCC 2026Conference paper
Enabling Spill-Free Compilation via Affine-Based Live Range Reduction OptimizationPrasanth ChatarasiAlex Gateaet al.2026CGO 2026Conference paper
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference InfrastructureRui XieAsad Ul Haqet al.2025IEEE Computer Architecture LettersPaper
MixTrain: accelerating DNN training via input mixingSarada KrithivasanSanchari Senet al.2024Frontiers in Artificial IntelligencePaper
A Software-Assisted Peak Current Regulation Scheme to Improve Power-Limited Inference Performance in a 5nm AI SoCMonodeep KarJoel Silbermanet al.2024ISSCC 2024Conference paper
DNNDaSher: A Compiler Framework for Dataflow Compatible End-to-End Acceleration on IBM AIUSanchari SenShubham Jainet al.2024IEEE MicroPaper
Power-Limited Inference Performance Optimization Using a Software-Assisted Peak Current Regulation Scheme in a 5-nm AI SoCMonodeep KarJoel Silbermanet al.2024IEEE Journal of Solid-State CircuitsPaper
Deep Compression of Pre-trained Transformer ModelsNaigang WangChi-Chun Liuet al.2022NeurIPS 2022Conference paper
24 May 2021US11016840Low-overhead Error Prediction And Preemption In Deep Neural Network Using Apriori Network Statistics
16 Nov 2020US10838868Programmable Data Delivery By Load And Store Agents On A Processing Chip Interfacing With On-chip Memory Components And Directing Data To External Memory Components
17 Feb 2020US10565285Processor And Memory Transparent Convolutional Lowering And Auto Zero Padding For Deep Neural Network Implementations