Approximate computing and the efficient machine learning expeditionJörg HenkelHai Liet al.2022ICCAD 2022Conference paper
OnSRAM: Efficient Inter-Node On-Chip Scratchpad Management in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2022Transactions on Embedded Computing SystemsPaper
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit QuantizationAndrea FasoliChia-Yu Chenet al.2022INTERSPEECH 2022Conference paper
Accelerating DNN Training Through Selective Localized LearningSarada KrithivasanSanchari Senet al.2022Frontiers in NeurosciencePaper
A 7-nm Four-Core Mixed-Precision AI Chip with 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware ThrottlingSae Kyu LeeAnkur Agrawalet al.2021IEEE JSSCPaper
4-bit quantization of LSTM-based speech recognition modelsAndrea FasoliChia-Yu Chenet al.2021INTERSPEECH 2021Conference paper
Efficacy of Pruning in Ultra-Low Precision DNNsSanchari SenSwagath Venkataramaniet al.2021ISLPED 2021Conference paper
RaPiD: AI Accelerator for Ultra-Low Precision Training and InferenceSwagath VenkataramaniVijayalakshmi Srinivasanet al.2021ISCA 2021Conference paper
Efficient Management of Scratch-Pad Memories in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2021ISPASS 2021Conference paper
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware ThrottlingAnkur AgrawalSaekyu Leeet al.2021ISSCC 2021Conference paper
24 Feb 2025US12236338Single Function To Perform Combined Matrix Multiplication And Bias Add Operations
11 Nov 2024US12141513Method To Map Convolutional Layers Of Deep Neural Network On A Plurality Of Processing Elements With Simd Execution Units, Private Memories, And Connected As A 2d Systolic Processor Array