WingEdge777/vitamin-cuda

🍎 One kernel a day keeps high latency away. A hands-on CUDA learning path featuring a rich collection of kernels, from the basics to peak performance, seamlessly integrated as PyTorch C++ extensions.

GitHub repository with 127 stars and 7 forks.

Language: Cuda

Topics: cuda, gpu-computing, hpc, learning-by-doing, optimization, parallel-programming, tutorials

Open provider repository

Latest metric snapshot

2026-06-05: 127 stars and 7 forks.

Similar repositories

  1. 1. lavawolfiee/mini-flash-attention

    Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

    GitHub repository with 21 stars and 1 forks.

    Trending score: 1.02; stars gained: +9; forks gained: +1.

    Language: Cuda

    Topics: attention, cuda, cute, cutlass, flash-attention, flashattention

Trending in Cuda

  1. 1. lavawolfiee/mini-flash-attention

    Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

    GitHub repository with 21 stars and 1 forks.

    Trending score: 1.02; stars gained: +9; forks gained: +1.

    Language: Cuda

    Topics: attention, cuda, cute, cutlass, flash-attention, flashattention

  2. 2. XopMC/brainflayer-CUDA

    "brainflayer" CUDA & private key recovery tool

    GitHub repository with 6 stars and 4 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: Cuda

  3. 3. gau-nernst/learn-cuda

    Learn CUDA with PyTorch

    GitHub repository with 313 stars and 49 forks.

    Trending score: 0.04; stars gained: +0; forks gained: +0.

    Language: Cuda

  4. 5. AmrMSharafeldin/semanticfoam

    This repository contains the official implementation of Semantic Foam: Unifying Spatial and Semantic Scene Decomposition

    GitHub repository with 10 stars and 0 forks.

    Trending score: 0.04; stars gained: +0; forks gained: +0.

    Language: Cuda

Trending topic: cuda

  1. 1. vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    GitHub repository with 82,004 stars and 17,691 forks.

    Trending score: 3.75; stars gained: +79; forks gained: +46.

    Language: Python

    Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt

  2. 2. tenstorrent/tt-metal

    :metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

    GitHub repository with 1,495 stars and 481 forks.

    Trending score: 1.82; stars gained: +7; forks gained: +5.

    Language: C++

    Topics: accelerator, ai, cuda, deepseek, gpu, img-gen

  3. 3. AmmarkoV/SAM3DBody-cpp

    Real-time 3D full-body reconstruction from a single camera, Multiperson BVH output, Pure C++ runtime, ONNX + ggml, 70-joint skeleton with hands.

    GitHub repository with 475 stars and 62 forks.

    Trending score: 1.78; stars gained: +2; forks gained: +1.

    Language: C

    Topics: 3d-human-pose, bvh, computer-vision, cpp, cuda, ggml

  4. 4. sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    GitHub repository with 28,863 stars and 6,348 forks.

    Trending score: 1.72; stars gained: -55; forks gained: +18.

    Language: Python

    Topics: attention, blackwell, cuda, deepseek, diffusion, glm

  5. 5. c0deJedi/nbd-vram

    Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work

    GitHub repository with 397 stars and 10 forks.

    Trending score: 1.53; stars gained: +39; forks gained: +0.

    Language: Shell

    Topics: cuda, gpu, laptop, linux, memory, nbd

  6. 6. flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    GitHub repository with 5,752 stars and 1,026 forks.

    Trending score: 1.16; stars gained: +15; forks gained: +8.

    Language: Python

    Topics: attention, cuda, distributed-inference, gpu, jit, large-large-models