Swagath Venkataramani

Title

Principal Research Scientist, AIU Architecture and Compilers

Publications

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling
- - Ankur Agrawal
  - Saekyu Lee
  - et al.
- 2021
- ISSCC 2021
Conference paper
Value Similarity Extensions for Approximate Computing in General-Purpose Processors
- - Younghoon Kim
  - Swagath Venkataramani
  - et al.
- 2021
- DATE 2021
Conference paper
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
- - Chia-Yu Chen
  - Jiamin Ni
  - et al.
- 2020
- NeurIPS 2020
Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural Networks
- - Xiao Sun
  - Naigang Wang
  - et al.
- 2020
- NeurIPS 2020
Conference paper
Efficient AI System Design with Cross-Layer Approximate Computing
- - Swagath Venkataramani
  - Xiao Sun
  - et al.
- 2020
- Proceedings of the IEEE
Paper
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference
- - Jinwook Oh
  - Sae Kyu Lee
  - et al.
- 2020
- VLSI Circuits 2020
Conference paper
DyVEDeep: Dynamic Variable Effort Deep Neural Networks
- - Sanjay Ganapathy
  - Swagath Venkataramani
  - et al.
- 2020
- ACM TECS
Paper
Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks
- - Xiao Sun
  - Jungwook Choi
  - et al.
- 2019
- NeurIPS 2019
Conference paper
Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems
- - Swagath Venkataramani
  - Vijayalakshmi Srinivasan
  - et al.
- 2019
- HiPC 2019
Conference paper
Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗
- - Swagath Venkataramani
  - Jungwook Choi
  - et al.
- 2019
- IISWC 2019
Conference paper

Top collaborators

Alberto Mannari

Software Developer

Prasanth Chatarasi

Senior Research Scientist, AI Accelerator Compilers and Architecture

Matthew Ziegler

Principal Research Scientist

Paul G Crumley

STSM, AI & Hybrid Cloud Infrastructure

Swagath Venkataramani

Title

Publications

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

Value Similarity Extensions for Approximate Computing in General-Purpose Processors

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Ultra-Low Precision 4-bit Training of Deep Neural Networks

Efficient AI System Design with Cross-Layer Approximate Computing

A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks

Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems

Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗

Patents

Optimized Hierarchical Scratchpads For Enhanced Artificial Intelligence Accelerator Core Utilization

Dynamically Resizing Minibatch In Neural Network Execution

Bi-scaled Deep Neural Networks

Facilitating Neural Network Efficiency

Deep Neural Network Performance Analysis On Shared Memory Accelerator Systems

Self-evaluating Array Of Memory

Low-overhead Error Prediction And Preemption In Deep Neural Network Using Apriori Network Statistics

Programmable Data Delivery By Load And Store Agents On A Processing Chip Interfacing With On-chip Memory Components And Directing Data To External Memory Components