Value Similarity Extensions for Approximate Computing in General-Purpose ProcessorsYounghoon KimSwagath Venkataramaniet al.2021DATE 2021Conference paper
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural NetworksXiao SunNaigang Wanget al.2020NeurIPS 2020Conference paper
Efficient AI System Design with Cross-Layer Approximate ComputingSwagath VenkataramaniXiao Sunet al.2020Proceedings of the IEEEPaper
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020Conference paper
DyVEDeep: Dynamic Variable Effort Deep Neural NetworksSanjay GanapathySwagath Venkataramaniet al.2020ACM TECSPaper
Hybrid 8-bit floating point (HFP8) training and inference for deep neural networksXiao SunJungwook Choiet al.2019NeurIPS 2019Conference paper
Memory and Interconnect Optimizations for Peta-Scale Deep Learning SystemsSwagath VenkataramaniVijayalakshmi Srinivasanet al.2019HiPC 2019Conference paper
Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗Swagath VenkataramaniJungwook Choiet al.2019IISWC 2019Conference paper
DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI AcceleratorSwagath VenkataramaniJungwook Choiet al.2019IEEE MicroPaper
24 Feb 2025US12236338Single Function To Perform Combined Matrix Multiplication And Bias Add Operations
11 Nov 2024US12141513Method To Map Convolutional Layers Of Deep Neural Network On A Plurality Of Processing Elements With Simd Execution Units, Private Memories, And Connected As A 2d Systolic Processor Array