A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware ThrottlingAnkur AgrawalSaekyu Leeet al.2021ISSCC 2021Conference paper
Value Similarity Extensions for Approximate Computing in General-Purpose ProcessorsYounghoon KimSwagath Venkataramaniet al.2021DATE 2021Conference paper
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural NetworksXiao SunNaigang Wanget al.2020NeurIPS 2020Conference paper
Efficient AI System Design with Cross-Layer Approximate ComputingSwagath VenkataramaniXiao Sunet al.2020Proceedings of the IEEEPaper
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020Conference paper
DyVEDeep: Dynamic Variable Effort Deep Neural NetworksSanjay GanapathySwagath Venkataramaniet al.2020ACM TECSPaper
Hybrid 8-bit floating point (HFP8) training and inference for deep neural networksXiao SunJungwook Choiet al.2019NeurIPS 2019Conference paper
Memory and Interconnect Optimizations for Peta-Scale Deep Learning SystemsSwagath VenkataramaniVijayalakshmi Srinivasanet al.2019HiPC 2019Conference paper
Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗Swagath VenkataramaniJungwook Choiet al.2019IISWC 2019Conference paper
29 Aug 2022US11429524Optimized Hierarchical Scratchpads For Enhanced Artificial Intelligence Accelerator Core Utilization
24 May 2021US11016840Low-overhead Error Prediction And Preemption In Deep Neural Network Using Apriori Network Statistics
16 Nov 2020US10838868Programmable Data Delivery By Load And Store Agents On A Processing Chip Interfacing With On-chip Memory Components And Directing Data To External Memory Components
17 Feb 2020US10565285Processor And Memory Transparent Convolutional Lowering And Auto Zero Padding For Deep Neural Network Implementations