A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-ExpertsMohammed Nowaz Rabbani ChowdhuryMeng Wanget al.2024ICML 2024Conference paper
Improved Techniques for Quantizing Deep Networks with Adaptive Bit-WidthsXimeng SunRameswar Pandaet al.2024WACV 2024Conference paper
Deep Compression of Pre-trained Transformer ModelsNaigang WangChi-Chun Liuet al.2022NeurIPS 2022Conference paper
A 7-nm Four-Core Mixed-Precision AI Chip with 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware ThrottlingSae Kyu LeeAnkur Agrawalet al.2021IEEE JSSCPaper
4-bit quantization of LSTM-based speech recognition modelsAndrea FasoliChia-Yu Chenet al.2021INTERSPEECH 2021Conference paper
Hardware-Aware Neural Architecture Search: Survey and TaxonomyHadjer BenmezianeKaoutar El Maghraouiet al.2021IJCAI 2021Survey paper
RaPiD: AI Accelerator for Ultra-Low Precision Training and InferenceSwagath VenkataramaniVijayalakshmi Srinivasanet al.2021ISCA 2021Conference paper
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware ThrottlingAnkur AgrawalSaekyu Leeet al.2021ISSCC 2021Conference paper
Ultra-Low Precision 4-bit Training of Deep Neural NetworksXiao SunNaigang Wanget al.2020NeurIPS 2020Conference paper
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020Conference paper
03 Mar 2025US12240753Micro-electromechanical Device Having A Soft Magnetic Material Electrolessly Deposited On A Palladium Layer Coated Metal Beam
23 Dec 2024US12175359Machine Learning Hardware Having Reduced Precision parameter Components For Efficient Parameter Update
21 Jul 2024JP7525237Machine Learning Hardware Having Reduced Precision Parameter Components For Efficient Parameter Update