manjunathshiva/turboquant-mlx
Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)
GitHub repository with 41 stars and 10 forks.
Language: Python
Topics: apple-silicon, kv-cache, llm, mlx, quantization, turboquant