rickyzhang82/cs344
Introduction to Parallel Programming class code
GitHub repository with 7 stars and 1 forks.
Language: Cuda
Introduction to Parallel Programming class code
GitHub repository with 7 stars and 1 forks.
Language: Cuda
2026-06-04: 7 stars and 1 forks.
mKernel: fast multi-node, multi-GPU fused kernels
GitHub repository with 216 stars and 20 forks.
Trending score: 1.01; stars gained: +9; forks gained: +0.
Language: Cuda
CUDA Library Samples
GitHub repository with 2,424 stars and 459 forks.
Trending score: 0.76; stars gained: +5; forks gained: +1.
Language: Cuda
Topics: cuda, cudss, cufft, curand, cusolver, cusparse
Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch
GitHub repository with 21 stars and 1 forks.
Trending score: 0.72; stars gained: +4; forks gained: +0.
Language: Cuda
Topics: attention, cuda, cute, cutlass, flash-attention, flashattention
CUDA Kernel Benchmarking Library
GitHub repository with 868 stars and 109 forks.
Trending score: 0.32; stars gained: +1; forks gained: +0.
Language: Cuda
Topics: benchmark, kernel-benchmark, cuda-kernels, cuda, performance, nvidia
A light, transparent, and modular inference & quantization engine for studying LLMs.
GitHub repository with 9 stars and 0 forks.
Trending score: 0.05; stars gained: +0; forks gained: +0.
Language: Cuda
Topics: awq, cuda-graph, framework, megakernel, multi-backends, quantum-kernel
Learn CUDA with PyTorch
GitHub repository with 313 stars and 49 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Cuda
mKernel: fast multi-node, multi-GPU fused kernels
GitHub repository with 216 stars and 20 forks.
Trending score: 1.01; stars gained: +9; forks gained: +0.
Language: Cuda
CUDA Library Samples
GitHub repository with 2,424 stars and 459 forks.
Trending score: 0.76; stars gained: +5; forks gained: +1.
Language: Cuda
Topics: cuda, cudss, cufft, curand, cusolver, cusparse
Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch
GitHub repository with 21 stars and 1 forks.
Trending score: 0.72; stars gained: +4; forks gained: +0.
Language: Cuda
Topics: attention, cuda, cute, cutlass, flash-attention, flashattention
CUDA Kernel Benchmarking Library
GitHub repository with 868 stars and 109 forks.
Trending score: 0.32; stars gained: +1; forks gained: +0.
Language: Cuda
Topics: benchmark, kernel-benchmark, cuda-kernels, cuda, performance, nvidia
A light, transparent, and modular inference & quantization engine for studying LLMs.
GitHub repository with 9 stars and 0 forks.
Trending score: 0.05; stars gained: +0; forks gained: +0.
Language: Cuda
Topics: awq, cuda-graph, framework, megakernel, multi-backends, quantum-kernel
Learn CUDA with PyTorch
GitHub repository with 313 stars and 49 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Cuda