Michael-A-Kuykendall/shimmy
⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.
GitHub repository with 5,382 stars and 511 forks.
Language: Rust
Topics: llama, llamacpp, llm-inference, ollama-api, command-line-tool, gguf, inference-server, local-ai, machine-learning, api-server