defilantech/LLMKube
Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready
GitHub repository with 129 stars and 18 forks.
Language: Go
Topics: ai, gguf, gpu, inference, kubernetes, kubernetes-operator, llama-cpp, llm, local-llm, nvidia