sizzlecar/ferrum-infer-rs
Production-grade LLM inference in Rust. Single binary, OpenAI-compatible, runs on Apple Silicon and CUDA.
GitHub repository with 6 stars and 0 forks.
Language: C++
Topics: apple-silicon, cuda, inference, inference-engine, llama, llm, metal, mixture-of-experts, moe, openai-api