Swagath Venkataramani

Title

Principal Research Scientist, AIU Architecture and Compilers

Publications

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling
- - Ankur Agrawal
  - Saekyu Lee
  - et al.
- 2021
- ISSCC 2021
Conference paper
Value Similarity Extensions for Approximate Computing in General-Purpose Processors
- - Younghoon Kim
  - Swagath Venkataramani
  - et al.
- 2021
- DATE 2021
Conference paper
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
- - Chia-Yu Chen
  - Jiamin Ni
  - et al.
- 2020
- NeurIPS 2020
Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural Networks
- - Xiao Sun
  - Naigang Wang
  - et al.
- 2020
- NeurIPS 2020
Conference paper
Efficient AI System Design with Cross-Layer Approximate Computing
- - Swagath Venkataramani
  - Xiao Sun
  - et al.
- 2020
- Proceedings of the IEEE
Paper
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference
- - Jinwook Oh
  - Sae Kyu Lee
  - et al.
- 2020
- VLSI Circuits 2020
Conference paper
DyVEDeep: Dynamic Variable Effort Deep Neural Networks
- - Sanjay Ganapathy
  - Swagath Venkataramani
  - et al.
- 2020
- ACM TECS
Paper
Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks
- - Xiao Sun
  - Jungwook Choi
  - et al.
- 2019
- NeurIPS 2019
Conference paper
Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems
- - Swagath Venkataramani
  - Vijayalakshmi Srinivasan
  - et al.
- 2019
- HiPC 2019
Conference paper
Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗
- - Swagath Venkataramani
  - Jungwook Choi
  - et al.
- 2019
- IISWC 2019
Conference paper

Top collaborators

Alberto Mannari

Software Developer

Prasanth Chatarasi

Senior Research Scientist, AI Accelerator Compilers and Architecture

Matthew Ziegler

Principal Research Scientist

Paul G Crumley

STSM, AI & Hybrid Cloud Infrastructure

Swagath Venkataramani

Title

Publications

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

Value Similarity Extensions for Approximate Computing in General-Purpose Processors

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Ultra-Low Precision 4-bit Training of Deep Neural Networks

Efficient AI System Design with Cross-Layer Approximate Computing

A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

DyVEDeep: Dynamic Variable Effort Deep Neural Networks

Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks

Memory and Interconnect Optimizations for Peta-Scale Deep Learning Systems

Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗

Patents

Reformatting Of Tensors To Provide Sub-tensors

Reformatting Of Tensors To Provide Sub-tensors

Reformatting Of Tensors To Provide Sub-tensors

Reformatting Of Tensors To Provide Sub-tensors

Reformatting Of Tensors To Provide Sub-tensors

Reformatting Of Tensors To Provide Sub-tensors

Reformatting Of Tensors To Provide Sub-tensors

Single Function To Perform Combined Matrix Multiplication And Bias Add Operations

Method To Map Convolutional Layers Of Deep Neural Network On A Plurality Of Processing Elements With Simd Execution Units, Private Memories, And Connected As A 2d Systolic Processor Array

Hybrid Data-model Parallelism For Efficient Deep Learning

Top collaborators

Alberto Mannari

Prasanth Chatarasi

Matthew Ziegler

Paul G Crumley