uccl-project/mKernel

mKernel: fast multi-node, multi-GPU fused kernels

GitHub repository with 216 stars and 20 forks.

Language: Cuda

Open provider repository

24h trend summary

Trending score 0.76, activity score 0.04, stars gained +5, forks gained +1.

Latest metric snapshot

2026-06-05: 216 stars and 20 forks.

Similar repositories

  1. 1. alibaba/rtp-llm

    RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

    GitHub repository with 1,179 stars and 204 forks.

    Trending score: 1.09; stars gained: +9; forks gained: +0.

    Language: Cuda

    Topics: gpt, inference, llama, llm, llm-serving, llmops

  2. 2. lavawolfiee/mini-flash-attention

    Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

    GitHub repository with 21 stars and 1 forks.

    Trending score: 1.02; stars gained: +9; forks gained: +1.

    Language: Cuda

    Topics: attention, cuda, cute, cutlass, flash-attention, flashattention

  3. 3. NVIDIA/CUDALibrarySamples

    CUDA Library Samples

    GitHub repository with 2,424 stars and 459 forks.

    Trending score: 0.79; stars gained: +5; forks gained: +1.

    Language: Cuda

    Topics: cufft, curand, cusolver, cusparse, nvjpeg, cudss

  4. 4. uccl-project/mKernel

    mKernel: fast multi-node, multi-GPU fused kernels

    GitHub repository with 216 stars and 20 forks.

    Trending score: 0.76; stars gained: +5; forks gained: +1.

    Language: Cuda

  5. 5. brucefan1983/GPUMD

    Graphics Processing Units Molecular Dynamics

    GitHub repository with 782 stars and 186 forks.

    Trending score: 0.69; stars gained: +4; forks gained: +2.

    Language: Cuda

    Topics: cuda, gpu, gpumd, heat-transport, high-performance-computing, machine-learning

  6. 6. mirage-project/mirage

    Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

    GitHub repository with 2,290 stars and 214 forks.

    Trending score: 0.60; stars gained: +3; forks gained: -1.

    Language: Cuda

Trending in Cuda

  1. 1. alibaba/rtp-llm

    RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

    GitHub repository with 1,179 stars and 204 forks.

    Trending score: 1.09; stars gained: +9; forks gained: +0.

    Language: Cuda

    Topics: gpt, inference, llama, llm, llm-serving, llmops

  2. 2. lavawolfiee/mini-flash-attention

    Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

    GitHub repository with 21 stars and 1 forks.

    Trending score: 1.02; stars gained: +9; forks gained: +1.

    Language: Cuda

    Topics: attention, cuda, cute, cutlass, flash-attention, flashattention

  3. 3. NVIDIA/CUDALibrarySamples

    CUDA Library Samples

    GitHub repository with 2,424 stars and 459 forks.

    Trending score: 0.79; stars gained: +5; forks gained: +1.

    Language: Cuda

    Topics: cufft, curand, cusolver, cusparse, nvjpeg, cudss

  4. 4. uccl-project/mKernel

    mKernel: fast multi-node, multi-GPU fused kernels

    GitHub repository with 216 stars and 20 forks.

    Trending score: 0.76; stars gained: +5; forks gained: +1.

    Language: Cuda

  5. 5. brucefan1983/GPUMD

    Graphics Processing Units Molecular Dynamics

    GitHub repository with 782 stars and 186 forks.

    Trending score: 0.69; stars gained: +4; forks gained: +2.

    Language: Cuda

    Topics: cuda, gpu, gpumd, heat-transport, high-performance-computing, machine-learning

  6. 6. mirage-project/mirage

    Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

    GitHub repository with 2,290 stars and 214 forks.

    Trending score: 0.60; stars gained: +3; forks gained: -1.

    Language: Cuda