KaiFelixBennett/gemma4-turboquant-rdna4
Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.
GitHub repository with 6 stars and 0 forks.
Language: Python
Topics: amd, flash-attention, gemma, gfx1201, hip, kv-cache, llama-cpp, llm-inference, local-llm, long-context