konjoai/squish
🤖🗜️⚡️ Local LLM server for Apple Silicon. 5.4× faster end-to-end on long contexts vs Ollama, 33% less RAM, INT3 support for Qwen3. OpenAI + Ollama drop-in. Built for repeated long-context workloads on memory-constrained Macs.
GitHub repository with 7 stars and 0 forks.
Language: Python
Topics: apple-silicon, inference-engine, int4, kv-cache, llama-cpp-alternative, llm, llm-infernece, local-ai, local-llm, macos