raketenkater/llm-server
Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and crash recovery. An Ollama alternative for multi-GPU rigs.
GitHub repository with 226 stars and 11 forks.
Language: Go
Topics: cuda, gguf, llama-cpp, llm, metal, moe, multi-gpu, golang, inference-server, llamacpp