tirdyhouse/cascade
Extend LLM context windows beyond GPU memory limits with disk-backed KV cache.
GitHub repository with 11 stars and 0 forks.
Language: Python
Topics: ai, cache, deepseek, disk-cache, gpu, inference, kv-cache, llm-inference, nvme, vllm