github.com

Advanced Quantization Toolkit for LLMs and VLMs

AutoRound is a quantization toolkit for LLMs and VLMs, optimizing performance with high accuracy at low bit widths.

flux

Tech Stack

Summary

AutoRound is an advanced quantization toolkit designed for Large Language Models (LLMs) and Vision-Language Models (VLMs). It achieves high accuracy at ultra-low bit widths (2–4 bits) with minimal tuning by leveraging sign-gradient descent and providing broad hardware compatibility.

Key features:

Superior Accuracy - Delivers strong performance even at 2–3 bits with leading results at 4 bits benchmark.
Ecosystem Integration - Seamlessly works with Transformers, vLLM, SGLang and more.
Multiple Formats Export Support - Supports AutoRound, AutoAWQ, AutoGPTQ, and GGUF for maximum compatibility.
Fast Mixed Bits/Dtypes Scheme Generation - Automatically configure in minutes with minimal overhead.
Optimized Round-to-Nearest Mode - Fast quantization option with some accuracy drop for 4 bits.
Affordable Quantization Cost - Quantize 7B models in about 10 minutes on a single GPU.
10+ VLMs Support - Out-of-the-box quantization for multiple vision-language models.
Advanced Utilities - Includes multiple GPUs quantization and support for 10+ runtime backends.
Beyond Weight Only Quantization - Expanding support for additional datatypes such as MXFP, NVFP, W8A8, and more.

In conclusion, AutoRound is a versatile tool that enhances the efficiency and accuracy of model quantization, making it suitable for various applications in machine learning and AI.

Comments

No comments yet. Sign in to add the first comment!