NVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
GitHub repository with 9,893 stars and 1,906 forks.
Language: C++
Topics: cuda, deep-learning, deep-learning-library, cpp, nvidia, gpu, python
CUDA Templates and Python DSLs for High-Performance Linear Algebra
GitHub repository with 9,893 stars and 1,906 forks.
Language: C++
Topics: cuda, deep-learning, deep-learning-library, cpp, nvidia, gpu, python
2026-06-14: 9,893 stars and 1,906 forks.
Fast LLM speculative inference server for consumer hardware.
GitHub repository with 2,491 stars and 229 forks.
Trending score: 2.88; stars gained: +27; forks gained: +6.
Language: C++
Topics: kernel, llama-cpp, local-ai, nvidia-cuda, qwen, rtx3090
Train, inspect, edit, automate, and export 3D Gaussian Splatting scenes from a single native application.
GitHub repository with 3,230 stars and 359 forks.
Trending score: 2.12; stars gained: +6; forks gained: +0.
Language: C++
Topics: cuda, gaussian-splatting, optimization, computer-graphics, computer-vision
Making it easier to work with shaders
GitHub repository with 5,371 stars and 455 forks.
Trending score: 1.68; stars gained: +4; forks gained: +0.
Language: C++
Topics: cuda, d3d12, glsl, hlsl, shaders, vulkan
HIP: C++ Heterogeneous-Compute Interface for Portability
GitHub repository with 4,347 stars and 587 forks.
Trending score: 1.33; stars gained: +3; forks gained: +5.
Language: C++
Topics: hip, hip-runtime, hip-portability, hip-kernel-language, hipify, cuda
CUDA Core Compute Libraries
GitHub repository with 2,380 stars and 410 forks.
Trending score: 1.21; stars gained: +1; forks gained: +2.
Language: C++
Topics: accelerated-computing, cpp, cpp-programming, cuda, cuda-cpp, cuda-kernels
High-Performance Rendering Framework on Stream Architectures
GitHub repository with 1,023 stars and 101 forks.
Trending score: 0.90; stars gained: +5; forks gained: +0.
Language: C++
Topics: cpu, gpu, high-performance, cross-platform, cuda, directx
LLM inference in C/C++
GitHub repository with 116,599 stars and 19,593 forks.
Trending score: 4.92; stars gained: +285; forks gained: +59.
Language: C++
Topics: ggml
Open Source Computer Vision Library
GitHub repository with 89,158 stars and 56,658 forks.
Trending score: 4.35; stars gained: +147; forks gained: +16.
Language: C++
Topics: c-plus-plus, computer-vision, deep-learning, image-processing, opencv
A lightweight, lightning-fast, in-process vector database
GitHub repository with 10,064 stars and 583 forks.
Trending score: 4.23; stars gained: +283; forks gained: +17.
Language: C++
Topics: agent-skills, db, embedded, faiss, hnsw, llm-memory
Port of OpenAI's Whisper model in C/C++
GitHub repository with 50,727 stars and 5,661 forks.
Trending score: 3.92; stars gained: +155; forks gained: +26.
Language: C++
Topics: inference, openai, speech-recognition, speech-to-text, transformer, whisper
A sleek and minimal desktop shell thoughtfully crafted for Wayland.
GitHub repository with 7,750 stars and 545 forks.
Trending score: 3.82; stars gained: +91; forks gained: +10.
Language: C++
Topics: dotfiles, hyprland, linux, niri, noctalia, quickshell
MLX: An array framework for Apple silicon
GitHub repository with 27,010 stars and 1,908 forks.
Trending score: 3.68; stars gained: +58; forks gained: +10.
Language: C++
Topics: mlx
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
GitHub repository with 9,088 stars and 1,321 forks.
Trending score: 4.67; stars gained: +411; forks gained: +26.
Language: Python
Topics: amd, cuda, fast, inference, kv-cache, llm
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 82,907 stars and 18,078 forks.
Trending score: 4.18; stars gained: +80; forks gained: +23.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 29,030 stars and 6,538 forks.
Trending score: 3.37; stars gained: +33; forks gained: +15.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
Fast LLM speculative inference server for consumer hardware.
GitHub repository with 2,491 stars and 229 forks.
Trending score: 2.88; stars gained: +27; forks gained: +6.
Language: C++
Topics: kernel, llama-cpp, local-ai, nvidia-cuda, qwen, rtx3090
cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX — no DSLs, no foreign language bindings, just Rust.
GitHub repository with 2,755 stars and 182 forks.
Trending score: 2.83; stars gained: +25; forks gained: +4.
Language: Rust
Topics: async, compiler-backend, cuda, gpu, heterogeneous-computing, high-performance-computing
UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms
GitHub repository with 739 stars and 71 forks.
Trending score: 2.51; stars gained: +10; forks gained: +4.
Language: Python
Topics: cross-platform, cuda, macos, motrixsim, mujoco, reinforcement-learning