LL4nc33/llama-tq
llama.cpp fork tuned for running modern models (Gemma-4, Qwen3.x) at full context on 12 GB Turing GPUs (RTX 2060/2070/2080, T4). TurboQuant KV cache (KTQ+VTQ, 2.78 bpw f16-quality), SWA-aware KV, MTP+n-gram speculation.
GitHub repository with 6 stars and 0 forks.
Language: C++
Topics: fine-tuning, kv-cache, llama-cpp, llama-tq, low-vram, speculative-decoding, turboquant, turing