Mininglamp-AI/cider
W8A8/W4A8 inference on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, built as MLX custom primitives.
GitHub repository with 318 stars and 15 forks.
Language: Python
Topics: apple-silicon, metal, mlx, quantization, w4a8, w8a8