lna-lab/gemma4-12b-vllm-sm120
Reproducible recipe: serve abliterated Gemma-4-12B (gemma4_unified) at 50-118 tok/s on no-NVLink Blackwell (SM120) via vLLM nightly + ModelOpt FP8/NVFP4 + MTP spec-decode.
GitHub repository with 15 stars and 0 forks.
Language: Python
Topics: abliterated, blackwell, fp8, gemma, gemma-4, modelopt, nvfp4, quantization, sm120, speculative-decoding