Approximate computing and the efficient machine learning expeditionJörg HenkelHai Liet al.2022ICCAD 2022Conference paper
OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2022Transactions on Embedded Computing SystemsPaper
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit QuantizationAndrea FasoliChia-Yu Chenet al.2022INTERSPEECH 2022Conference paper
Accelerating DNN Training Through Selective Localized LearningSarada KrithivasanSanchari Senet al.2022Frontiers in NeurosciencePaper
A 7-nm Four-Core Mixed-Precision AI Chip with 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware ThrottlingSae Kyu LeeAnkur Agrawalet al.2021IEEE JSSCPaper
4-bit quantization of LSTM-based speech recognition modelsAndrea FasoliChia-Yu Chenet al.2021INTERSPEECH 2021Conference paper
Efficacy of Pruning in Ultra-Low Precision DNNsSanchari SenSwagath Venkataramaniet al.2021ISLPED 2021Conference paper
RaPiD: AI Accelerator for Ultra-Low Precision Training and InferenceSwagath VenkataramaniVijayalakshmi Srinivasanet al.2021ISCA 2021Conference paper
Efficient Management of Scratch-Pad Memories in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2021ISPASS 2021Conference paper
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware ThrottlingAnkur AgrawalSaekyu Leeet al.2021ISSCC 2021Conference paper