theogravity/dual-rtx-6000-blackwell-qwen3.6-27b-fp8
Optimized vLLM setup for Qwen3.6-27B-FP8 on dual RTX PRO 6000 Blackwell (192 GB GDDR7, no NVLink) ; config, benchmark sweep results, and custom chat template with thinking mode off by default.
GitHub repository with 10 stars and 0 forks.
Language: Shell
Topics: benchmark, blackwell, fp8, llm-inference, local-llm, multi-token-prediction, qwen3, rtx-pro-6000, speculative-decoding, vllm