weicj/vLLM-2080Ti-Definitive
The definitive vLLM runtime for dual RTX 2080 Ti 22GB + NVLink, delivering 27B/31B local inference with 100+ tok/s single-request decode and maximum 1M context.
GitHub repository with 58 stars and 11 forks.
Language: Python