ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
GitHub repository with 1,172 stars and 186 forks.
Language: Python
Topics: gptq, optimum, peft, quantization, sglang, transformers, vllm