Naigang Wang

Title

RSM, Manager, AI acceleration algorithm and framework

Publications

Deep Compression of Pre-trained Transformer Models
- - Naigang Wang
  - Chi-Chun Liu
  - et al.
- 2022
- NeurIPS 2022
Conference paper
A 7-nm Four-Core Mixed-Precision AI Chip with 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling
- - Sae Kyu Lee
  - Ankur Agrawal
  - et al.
- 2021
- IEEE JSSC
Paper
4-bit quantization of LSTM-based speech recognition models
- - Andrea Fasoli
  - Chia-Yu Chen
  - et al.
- 2021
- INTERSPEECH 2021
Conference paper
Hardware-Aware Neural Architecture Search: Survey and Taxonomy
- - Hadjer Benmeziane
  - Kaoutar El Maghraoui
  - et al.
- 2021
- IJCAI 2021
Survey paper
RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference
- - Swagath Venkataramani
  - Vijayalakshmi Srinivasan
  - et al.
- 2021
- ISCA 2021
Conference paper
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling
- - Ankur Agrawal
  - Saekyu Lee
  - et al.
- 2021
- ISSCC 2021
Conference paper
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
- - Chia-Yu Chen
  - Jiamin Ni
  - et al.
- 2020
- NeurIPS 2020
Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural Networks
- - Xiao Sun
  - Naigang Wang
  - et al.
- 2020
- NeurIPS 2020
Conference paper
Efficient AI System Design with Cross-Layer Approximate Computing
- - Swagath Venkataramani
  - Xiao Sun
  - et al.
- 2020
- Proceedings of the IEEE
Paper
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference
- - Jinwook Oh
  - Sae Kyu Lee
  - et al.
- 2020
- VLSI Circuits 2020
Conference paper

Blog posts

Ultra-low-precision training of deep neural networks
Technical note
Naigang Wang
09 May 2019
- AI
8-bit precision for training deep learning systems
Research
Naigang Wang
03 Dec 2018
- AI
- AI Hardware

Top collaborators

Kaoutar El Maghraoui

Principal Research Scientist, AIU Spyre Software Ecosystem, AI Hardware Center

Karthik Swaminathan

Senior Research Scientist, Efficient and Resilient Systems

Pin-Yu Chen

Principal Research Scientist and Manager; Chief Scientist, RPI-IBM AI Research Collaboration

Swagath Venkataramani

Principal Research Scientist, AIU Architecture and Compilers

Naigang Wang

Title

Publications

Deep Compression of Pre-trained Transformer Models

A 7-nm Four-Core Mixed-Precision AI Chip with 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware Throttling

4-bit quantization of LSTM-based speech recognition models

Hardware-Aware Neural Architecture Search: Survey and Taxonomy

RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Ultra-Low Precision 4-bit Training of Deep Neural Networks

Efficient AI System Design with Cross-Layer Approximate Computing

A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

Patents

Deep Learning Accelerator Architecture With Chunking Gemm

Removing A Residual Photo-mask Fence In Photolithography

Method Of Fabricating A Laminated Stack Of Magnetic Inductor

Method Of Fabricating A Magnetic Stack Arrangement Of A Laminated Magnetic Inductor

Method For Forming A Planar, Closed Loop Magnetic Structure

Magnetic Inductor With Multiple Magnetic Layer Thicknesses

Magnetic Inductor With Multiple Magnetic Layer Thicknesses

Very Low Precision Floating Point Representation For Deep Learning Acceleration

Stress Control In Magnetic Inductor Stacks

Magnetic Inductor Stacks With Multilayer Isolation Layers