kekzl/imp

High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell (RTX 5090/5080/5070 Ti, RTX PRO 6000; sm_120). Native NVFP4/GGUF, 270 tok/s decode on Qwen3-Coder-30B MoE. Written entirely by Claude Code.

GitHub repository with 18 stars and 2 forks.

Language: Cuda

Topics: blackwell, cpp, cuda, cuda-graphs, gated-deltanet, gguf, inference, inference-engine, llm, mixture-of-experts

Open provider repository

Latest metric snapshot

2026-06-05: 18 stars and 2 forks.

Trending in Cuda

1. alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

GitHub repository with 1,179 stars and 204 forks.

Trending score: 1.09; stars gained: +9; forks gained: +0.

Language: Cuda

Topics: gpt, inference, llama, llm, llm-serving, llmops
2. lavawolfiee/mini-flash-attention

Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

GitHub repository with 21 stars and 1 forks.

Trending score: 1.02; stars gained: +9; forks gained: +1.

Language: Cuda

Topics: attention, cuda, cute, cutlass, flash-attention, flashattention
3. NVIDIA/CUDALibrarySamples

CUDA Library Samples

GitHub repository with 2,424 stars and 459 forks.

Trending score: 0.79; stars gained: +5; forks gained: +1.

Language: Cuda

Topics: cufft, curand, cusolver, cusparse, nvjpeg, cudss
4. brucefan1983/GPUMD

Graphics Processing Units Molecular Dynamics

GitHub repository with 782 stars and 186 forks.

Trending score: 0.69; stars gained: +4; forks gained: +2.

Language: Cuda

Topics: molecular-dynamics-simulation, heat-transport, cuda, molecular-dynamics, gpumd, phonon
5. NVIDIA/nvbench

CUDA Kernel Benchmarking Library

GitHub repository with 870 stars and 109 forks.

Trending score: 0.50; stars gained: +1; forks gained: +0.

Language: Cuda

Topics: benchmark, kernel-benchmark, cuda-kernels, cuda, performance, nvidia
6. rapidsai/cugraph

cuGraph - RAPIDS Graph Analytics Library

GitHub repository with 2,189 stars and 357 forks.

Trending score: 0.49; stars gained: +2; forks gained: +0.

Language: Cuda

Topics: rapids, nvidia, gpu, cuda, graph, graph-algorithms

kekzl/imp

Latest metric snapshot

Trending in Cuda

1. alibaba/rtp-llm

2. lavawolfiee/mini-flash-attention

3. NVIDIA/CUDALibrarySamples

4. brucefan1983/GPUMD

5. NVIDIA/nvbench

6. rapidsai/cugraph

Trending topic: blackwell

1. vllm-project/vllm

2. openlake-project/openlake

3. lightseekorg/tokenspeed

4. sgl-project/sglang

5. NVIDIA/TensorRT-LLM

6. NVIDIA/cudnn-frontend