jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
GitHub repository with 15,973 stars and 1,370 forks.
Language: Python
Topics: apple-silicon, inference-server, llm, macos, mlx, openai-api