theogravity/dual-rtx-6000-blackwell-Gemma-4-31B-IT-NVFP4
Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.
GitHub repository with 5 stars and 0 forks.
Language: Shell