Click any tag below to further narrow down your results
Links
The SGLang RL team developed an end-to-end INT4 Quantization-Aware Training (QAT) pipeline that enhances training efficiency and model stability. By using fake quantization during training and real quantization at inference, they achieved significant performance improvements for large models on a single GPU. The article details the technical steps taken and results of their approach.
AutoRound is an innovative quantization tool developed by Intel for efficient deployment of large language and vision-language models. It utilizes weight-only post-training quantization to achieve high accuracy at low-bit widths, while remaining fast and compatible with various models and devices. With features like mixed-bit tuning and minimal resource requirements, AutoRound provides a practical solution for optimizing AI model performance.