Value Similarity Extensions for Approximate Computing in General-Purpose ProcessorsYounghoon KimSwagath Venkataramaniet al.2021DATE 2021Conference paper
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural NetworksXiao SunNaigang Wanget al.2020NeurIPS 2020Conference paper
Efficient AI System Design with Cross-Layer Approximate ComputingSwagath VenkataramaniXiao Sunet al.2020Proceedings of the IEEEPaper
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020Conference paper
DyVEDeep: Dynamic Variable Effort Deep Neural NetworksSanjay GanapathySwagath Venkataramaniet al.2020ACM TECSPaper
Hybrid 8-bit floating point (HFP8) training and inference for deep neural networksXiao SunJungwook Choiet al.2019NeurIPS 2019Conference paper
Memory and Interconnect Optimizations for Peta-Scale Deep Learning SystemsSwagath VenkataramaniVijayalakshmi Srinivasanet al.2019HiPC 2019Conference paper
Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗Swagath VenkataramaniJungwook Choiet al.2019IISWC 2019Conference paper
DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI AcceleratorSwagath VenkataramaniJungwook Choiet al.2019IEEE MicroPaper
05 Jan 2026CNZL202080055389.3System-aware Selective Quantization For Performance Optimized Distributed Deep Learning