kekzl/imp
High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell (RTX 5090/5080/5070 Ti, RTX PRO 6000; sm_120). Native NVFP4/GGUF, 270 tok/s decode on Qwen3-Coder-30B MoE. Written entirely by Claude Code.
GitHub repository with 18 stars and 2 forks.
Language: Cuda
Topics: blackwell, cpp, cuda, cuda-graphs, gated-deltanet, gguf, inference, inference-engine, llm, mixture-of-experts