FZJ-JSC/tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

GitHub repository with 361 stars and 76 forks.

Language: Cuda

Topics: cuda, exascale-computing, gpu, hpc, isc22, isc23, isc24, isc25, mpi, multi-gpu

Open provider repository

Latest metric snapshot

2026-06-05: 361 stars and 76 forks.

Similar repositories

1. lavawolfiee/mini-flash-attention

Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

GitHub repository with 21 stars and 1 forks.

Trending score: 1.02; stars gained: +9; forks gained: +1.

Language: Cuda

Topics: attention, cuda, cute, cutlass, flash-attention, flashattention
2. NVIDIA/CUDALibrarySamples

CUDA Library Samples

GitHub repository with 2,424 stars and 459 forks.

Trending score: 0.79; stars gained: +5; forks gained: +1.

Language: Cuda

Topics: cufft, curand, cusolver, cusparse, nvjpeg, cudss
3. brucefan1983/GPUMD

Graphics Processing Units Molecular Dynamics

GitHub repository with 782 stars and 186 forks.

Trending score: 0.69; stars gained: +4; forks gained: +2.

Language: Cuda

Topics: cuda, gpu, gpumd, heat-transport, high-performance-computing, machine-learning
4. NVIDIA/nvbench

CUDA Kernel Benchmarking Library

GitHub repository with 868 stars and 109 forks.

Trending score: 0.50; stars gained: +1; forks gained: +0.

Language: Cuda

Topics: benchmark, kernel-benchmark, cuda-kernels, cuda, performance, nvidia
5. rapidsai/cugraph

cuGraph - RAPIDS Graph Analytics Library

GitHub repository with 2,189 stars and 357 forks.

Trending score: 0.49; stars gained: +2; forks gained: +0.

Language: Cuda

Topics: rapids, nvidia, gpu, cuda, graph, graph-algorithms
6. supranational/sppark

Zero-knowledge template library

GitHub repository with 219 stars and 97 forks.

Trending score: 0.18; stars gained: +0; forks gained: +1.

Language: Cuda

Topics: cuda, bls12-377, bls12-381, pasta-curves, zero-knowledge, zero-knowledge-proofs

Trending in Cuda

1. alibaba/rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

GitHub repository with 1,179 stars and 204 forks.

Trending score: 1.09; stars gained: +9; forks gained: +0.

Language: Cuda

Topics: gpt, inference, llama, llm, llm-serving, llmops
2. lavawolfiee/mini-flash-attention

Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch

GitHub repository with 21 stars and 1 forks.

Trending score: 1.02; stars gained: +9; forks gained: +1.

Language: Cuda

Topics: attention, cuda, cute, cutlass, flash-attention, flashattention
3. NVIDIA/CUDALibrarySamples

CUDA Library Samples

GitHub repository with 2,424 stars and 459 forks.

Trending score: 0.79; stars gained: +5; forks gained: +1.

Language: Cuda

Topics: cufft, curand, cusolver, cusparse, nvjpeg, cudss
4. uccl-project/mKernel

mKernel: fast multi-node, multi-GPU fused kernels

GitHub repository with 216 stars and 20 forks.

Trending score: 0.76; stars gained: +5; forks gained: +1.

Language: Cuda
5. brucefan1983/GPUMD

Graphics Processing Units Molecular Dynamics

GitHub repository with 782 stars and 186 forks.

Trending score: 0.69; stars gained: +4; forks gained: +2.

Language: Cuda

Topics: cuda, gpu, gpumd, heat-transport, high-performance-computing, machine-learning
6. mirage-project/mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

GitHub repository with 2,290 stars and 214 forks.

Trending score: 0.60; stars gained: +3; forks gained: -1.

Language: Cuda

FZJ-JSC/tutorial-multi-gpu

Latest metric snapshot

Similar repositories

1. lavawolfiee/mini-flash-attention

2. NVIDIA/CUDALibrarySamples

3. brucefan1983/GPUMD

4. NVIDIA/nvbench

5. rapidsai/cugraph

6. supranational/sppark

Trending in Cuda

1. alibaba/rtp-llm

2. lavawolfiee/mini-flash-attention

3. NVIDIA/CUDALibrarySamples

4. uccl-project/mKernel

5. brucefan1983/GPUMD

6. mirage-project/mirage

Trending topic: cuda

1. vllm-project/vllm

2. gpustack/gpustack

3. Luce-Org/lucebox-hub

4. LMCache/LMCache

5. shader-slang/slang

6. tenstorrent/tt-metal