InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization
The paper titled 'InfoQuant' addresses the challenges of low-bit activation quantization in large language models. It proposes a method to shape activation distributions to improve quantization efficiency. The approach demonstrates significant performance improvements over existing methods while maintaining high accuracy.
- ▪Low-bit activation quantization is a major bottleneck in efficient large language model deployment.
- ▪The proposed method, InfoQuant, employs Peak Suppression Orthogonal Transformation to create quantization-friendly distributions.
- ▪InfoQuant achieves an average preservation of 97% floating-point accuracy and reduces performance gaps significantly.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.26175 (cs) [Submitted on 25 May 2026] Title:InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization Authors:Ke Li, Dong An, Xiaoling Zang, Can Ye, Liang Xie, Qibo Qiu, Chen Shen, Xiaofei He, Wenxiao Wang View a PDF of the paper titled InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization, by Ke Li and 8 other authors View PDF HTML (experimental) Abstract:Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only that activations contain outliers, but that their distributions are often poorly matched to a low-bit uniform quantizer.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.