huawei-csl/KVarN
KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
GitHub repository with 122 stars and 4 forks.
Language: Python
Topics: agentic-ai, kv-cache, llm, llm-inference, long-context, quantization, vllm