Yasuaki-Ito/GANSU
GPU Accelerated Numerical Simulation Utility for Quantum Chemistry
GitHub repository with 22 stars and 3 forks.
Language: Cuda
Topics: cuda, gpgpu, hartree-fock, post-hartree-fock, quantum-chemistry
GPU Accelerated Numerical Simulation Utility for Quantum Chemistry
GitHub repository with 22 stars and 3 forks.
Language: Cuda
Topics: cuda, gpgpu, hartree-fock, post-hartree-fock, quantum-chemistry
2026-06-05: 22 stars and 3 forks.
Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch
GitHub repository with 21 stars and 1 forks.
Trending score: 1.02; stars gained: +9; forks gained: +1.
Language: Cuda
Topics: attention, cuda, cute, cutlass, flash-attention, flashattention
Minimal FlashAttention in CUDA C++/CuTe: readable WMMA/CuTe kernels, no NxN workspace, up to 4.5x faster than naive PyTorch
GitHub repository with 21 stars and 1 forks.
Trending score: 1.02; stars gained: +9; forks gained: +1.
Language: Cuda
Topics: attention, cuda, cute, cutlass, flash-attention, flashattention
"brainflayer" CUDA & private key recovery tool
GitHub repository with 6 stars and 4 forks.
Trending score: 0.05; stars gained: +0; forks gained: +0.
Language: Cuda
Learn CUDA with PyTorch
GitHub repository with 313 stars and 49 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Cuda
GitHub repository with 14 stars and 1 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Cuda
This repository contains the official implementation of Semantic Foam: Unifying Spatial and Semantic Scene Decomposition
GitHub repository with 10 stars and 0 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Cuda
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 82,008 stars and 17,694 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
GitHub repository with 1,495 stars and 481 forks.
Trending score: 1.82; stars gained: +7; forks gained: +5.
Language: C++
Topics: accelerator, ai, cuda, deepseek, gpu, img-gen
Real-time 3D full-body reconstruction from a single camera, Multiperson BVH output, Pure C++ runtime, ONNX + ggml, 70-joint skeleton with hands.
GitHub repository with 475 stars and 62 forks.
Trending score: 1.78; stars gained: +2; forks gained: +1.
Language: C
Topics: 3d-human-pose, bvh, computer-vision, cpp, cuda, ggml
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 28,865 stars and 6,350 forks.
Trending score: 1.72; stars gained: -55; forks gained: +18.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work
GitHub repository with 399 stars and 10 forks.
Trending score: 1.53; stars gained: +39; forks gained: +0.
Language: Shell
Topics: cuda, gpu, laptop, linux, memory, nbd
FlashInfer: Kernel Library for LLM Serving
GitHub repository with 5,752 stars and 1,026 forks.
Trending score: 1.16; stars gained: +15; forks gained: +8.
Language: Python
Topics: attention, cuda, distributed-inference, gpu, jit, large-large-models