donitb934/1Cat-vLLM
Optimize Tesla V100 GPUs for AWQ 4-bit inference with improved speed, stability, and support for modern large models like Qwen3.5 and MoE.
GitHub repository with 5 stars and 0 forks.
Language: Python
Topics: 4x, api, cats, civ, cli, compiler, distributed, git, java, language