Scottcjn/exo-cuda
Exo distributed inference with NVIDIA CUDA support via tinygrad
GitHub repository with 80 stars and 11 forks.
Language: Python
Topics: cuda, distributed, exo, inference, llm, tinygrad
Exo distributed inference with NVIDIA CUDA support via tinygrad
GitHub repository with 80 stars and 11 forks.
Language: Python
Topics: cuda, distributed, exo, inference, llm, tinygrad
2026-06-05: 80 stars and 11 forks.
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 82,001 stars and 17,690 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 28,862 stars and 6,348 forks.
Trending score: 1.72; stars gained: -55; forks gained: +18.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
FlashInfer: Kernel Library for LLM Serving
GitHub repository with 5,752 stars and 1,026 forks.
Trending score: 1.16; stars gained: +15; forks gained: +8.
Language: Python
Topics: attention, cuda, distributed-inference, gpu, jit, large-large-models
OpenEquivariance: a fast, open-source GPU JIT kernel generator for the Clebsch-Gordon Tensor Product.
GitHub repository with 149 stars and 9 forks.
Trending score: 0.45; stars gained: +1; forks gained: +0.
Language: Python
Topics: cuda, geometric-deep-learning, graph-neural-networks, sparse-tensors, equivariance, hip
Docker image to run a self-hosted Kokoro TTS server with an OpenAI-compatible audio speech API. 50+ voices across 9 languages, streaming support, all major audio formats, NVIDIA GPU (CUDA) acceleration, offline mode, and persistent model cache. Multi-arch: amd64, arm64.
GitHub repository with 16 stars and 2 forks.
Trending score: 0.36; stars gained: +1; forks gained: +0.
Language: Python
Topics: openai, self-hosted, speech, text-to-speech, tts, speech-synthesis
Dual-engine (llama.cpp + vLLM) LLM benchmarking pipeline for GGUF & safetensors on NVIDIA GPUs — speed, quality, live dashboard, publishable cards.
GitHub repository with 9 stars and 2 forks.
Trending score: 0.33; stars gained: +1; forks gained: +0.
Language: Python
Topics: benchmarking, cuda, fastapi, gguf, llama-cpp, llm
The agent that grows with you
GitHub repository with 182,353 stars and 31,271 forks.
Trending score: 5.95; stars gained: +1,867; forks gained: +361.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
GitHub repository with 14,053 stars and 885 forks.
Trending score: 5.69; stars gained: +2,829; forks gained: +175.
Language: Python
Topics: agent, ai, anthropic, compression, context-engineering, context-window
Academic Research Skills for Claude Code: research → write → review → revise → finalize
GitHub repository with 27,548 stars and 2,267 forks.
Trending score: 5.52; stars gained: +1,079; forks gained: +89.
Language: Python
Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review
Learn it. Build it. Ship it for others.
GitHub repository with 28,711 stars and 4,695 forks.
Trending score: 5.32; stars gained: +1,261; forks gained: +238.
Language: Python
Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course
An opinionated list of Python frameworks, libraries, tools, and resources
GitHub repository with 301,427 stars and 28,046 forks.
Trending score: 4.60; stars gained: +518; forks gained: +24.
Language: Python
Topics: awesome, python, collections, python-frameworks, python-libraries, python-tools
Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)
GitHub repository with 32,539 stars and 4,943 forks.
Trending score: 4.56; stars gained: +467; forks gained: +82.
Language: Python
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 82,001 stars and 17,690 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
GitHub repository with 1,495 stars and 480 forks.
Trending score: 1.82; stars gained: +7; forks gained: +5.
Language: C++
Topics: accelerator, ai, cuda, deepseek, gpu, img-gen
Real-time 3D full-body reconstruction from a single camera, Multiperson BVH output, Pure C++ runtime, ONNX + ggml, 70-joint skeleton with hands.
GitHub repository with 475 stars and 62 forks.
Trending score: 1.78; stars gained: +2; forks gained: +1.
Language: C
Topics: 3d-human-pose, bvh, computer-vision, cpp, cuda, ggml
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 28,862 stars and 6,348 forks.
Trending score: 1.72; stars gained: -55; forks gained: +18.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work
GitHub repository with 397 stars and 10 forks.
Trending score: 1.53; stars gained: +39; forks gained: +0.
Language: Shell
Topics: cuda, gpu, laptop, linux, memory, nbd
FlashInfer: Kernel Library for LLM Serving
GitHub repository with 5,752 stars and 1,026 forks.
Trending score: 1.16; stars gained: +15; forks gained: +8.
Language: Python
Topics: attention, cuda, distributed-inference, gpu, jit, large-large-models