AutoRound is an advanced quantization toolkit designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It achieves high accuracy at ultra-low bit widths (2–4 bits) with minimal tuning by leveraging sign-gradient descent and providing broad hardware compatibility.
Key features:
- Superior Accuracy - Delivers strong performance even at 2–3 bits with leading results at 4 bits benchmark.
- Ecosystem Integration - Seamlessly works with Transformers, vLLM, SGLang and more.
- Multiple Formats Export Support - Supports AutoRound, AutoAWQ, AutoGPTQ, and GGUF for maximum compatibility.
- Fast Mixed Bits/Dtypes Scheme Generation - Automatically configure in minutes with minimal overhead.
- Optimized Round-to-Nearest Mode - Fast quantization option with some accuracy drop for 4 bits.
- Affordable Quantization Cost - Quantize 7B models in about 10 minutes on a single GPU.
- 10+ VLMs Support - Out-of-the-box quantization for multiple vision-language models.
- Advanced Utilities - Includes multiple GPUs quantization and support for 10+ runtime backends.
- Beyond Weight Only Quantization - Expanding support for additional datatypes such as MXFP, NVFP, W8A8, and more.
In conclusion, AutoRound is a versatile tool that enhances the efficiency and accuracy of model quantization, making it suitable for various applications in machine learning and AI.