intel/auto-round
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
GitHub repository with 1,435 stars and 134 forks.
Language: Python
Topics: int4, quantization, rounding, transformers, vllm, mxfp4, nvfp4, gguf, sglang, llms