tracel-ai/cubek
CubeK: high-performance multi-platform kernels in CubeCL
GitHub repository with 94 stars and 35 forks.
Language: Rust
Topics: cuda, gpu, hpc, rocm, vulkan
CubeK: high-performance multi-platform kernels in CubeCL
GitHub repository with 94 stars and 35 forks.
Language: Rust
Topics: cuda, gpu, hpc, rocm, vulkan
2026-06-05: 94 stars and 35 forks.
Agent-friendly GPU profile-query CLI
GitHub repository with 31 stars and 1 forks.
Trending score: 0.05.
Language: Rust
Topics: cli, cuda, gpu, ncu, nsys, profiling
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models such as YOLO, FastVLM, and more.
GitHub repository with 412 stars and 44 forks.
Trending score: 0.04.
Language: Rust
Topics: cuda, tensorrt, yolov8, ocr, yolo, rust-yolo
An enhanced tool for CodexApp, striving to make Codex better to use and more comfortable 一个CodexApp的增强工具,努力让Codex变得更好用更舒服
GitHub repository with 14,056 stars and 871 forks.
Trending score: 5.16; stars gained: +916; forks gained: +44.
Language: Rust
CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies
GitHub repository with 59,182 stars and 3,643 forks.
Trending score: 4.96; stars gained: +654; forks gained: +44.
Language: Rust
Topics: agentic-coding, ai-coding, anthropic, claude-code, cli, command-line-tool
Lightweight coding agent that runs in your terminal
GitHub repository with 88,938 stars and 13,072 forks.
Trending score: 4.58; stars gained: +326; forks gained: +48.
Language: Rust
Your Personal AI super intelligence. Private, Simple and extremely powerful.
GitHub repository with 30,877 stars and 2,982 forks.
Trending score: 4.37; stars gained: +332; forks gained: +50.
Language: Rust
Codebase intelligence for TypeScript and JavaScript. Free static layer: unused code, duplication, circular deps, complexity hotspots, architecture boundaries. Optional paid runtime layer: hot-path review and cold-path deletion evidence from real production traffic. Rust-native, sub-second, zero-config framework support.
GitHub repository with 3,118 stars and 96 forks.
Trending score: 4.05; stars gained: +346; forks gained: +16.
Language: Rust
Topics: cli, code-duplication, code-quality, codebase-intelligence, copy-paste-detection, dead-code
High performance object store for fast LLM Inference and GPU Training. Feed your GPUs at blazing fast speeds
GitHub repository with 1,118 stars and 176 forks.
Trending score: 4.00; stars gained: +244; forks gained: +120.
Language: Rust
Topics: blackwell, gpt, gpu, high-performance, llm, llm-training
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 82,006 stars and 17,693 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
GitHub repository with 1,495 stars and 481 forks.
Trending score: 1.82; stars gained: +7; forks gained: +5.
Language: C++
Topics: accelerator, ai, cuda, deepseek, gpu, img-gen
Real-time 3D full-body reconstruction from a single camera, Multiperson BVH output, Pure C++ runtime, ONNX + ggml, 70-joint skeleton with hands.
GitHub repository with 475 stars and 62 forks.
Trending score: 1.78; stars gained: +2; forks gained: +1.
Language: C
Topics: 3d-human-pose, bvh, computer-vision, cpp, cuda, ggml
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 28,865 stars and 6,350 forks.
Trending score: 1.72; stars gained: -55; forks gained: +18.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work
GitHub repository with 399 stars and 10 forks.
Trending score: 1.53; stars gained: +39; forks gained: +0.
Language: Shell
Topics: cuda, gpu, laptop, linux, memory, nbd
FlashInfer: Kernel Library for LLM Serving
GitHub repository with 5,752 stars and 1,026 forks.
Trending score: 1.16; stars gained: +15; forks gained: +8.
Language: Python
Topics: attention, cuda, distributed-inference, gpu, jit, large-large-models