lightseekorg/tokenspeed
TokenSpeed is a speed-of-light LLM inference engine.
GitHub repository with 1,366 stars and 141 forks.
Language: Python
Topics: blackwell, deepseek, gpt-oss, kimi, lightseek, llm, minimax, nemotron, qwen, speed-of-light
TokenSpeed is a speed-of-light LLM inference engine.
GitHub repository with 1,366 stars and 141 forks.
Language: Python
Topics: blackwell, deepseek, gpt-oss, kimi, lightseek, llm, minimax, nemotron, qwen, speed-of-light
Trending score 1.86, activity score 0.05, stars gained +6, forks gained +2.
2026-06-05: 1,366 stars and 141 forks.
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 81,995 stars and 17,676 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
TokenSpeed is a speed-of-light LLM inference engine.
GitHub repository with 1,366 stars and 141 forks.
Trending score: 1.86; stars gained: +6; forks gained: +2.
Language: Python
Topics: blackwell, deepseek, gpt-oss, kimi, lightseek, llm
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 28,859 stars and 6,348 forks.
Trending score: 1.72; stars gained: -55; forks gained: +18.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
GitHub repository with 13,807 stars and 2,440 forks.
Trending score: 1.18; stars gained: +16; forks gained: +7.
Language: Python
Topics: blackwell, cuda, llm-serving, moe, pytorch
cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.
GitHub repository with 840 stars and 177 forks.
Trending score: 0.69; stars gained: +4; forks gained: +2.
Language: Python
Topics: attention, blackwell, cuda-kernels, flash-attention, fp8, gemm
Nvidia ultimate undervolting companion on Linux. Can automatically scan for the most optimal GPU VF curve and generate silent fan curves. Supports MSI Afterburner profile imports and LACT profile exports.
GitHub repository with 53 stars and 2 forks.
Trending score: 0.13; stars gained: +0; forks gained: +0.
Language: Python
Topics: blackwell, fan-curve, gaming, gpu, linux, nvidia
The agent that grows with you
GitHub repository with 181,882 stars and 31,207 forks.
Trending score: 5.95; stars gained: +1,867; forks gained: +361.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
GitHub repository with 13,768 stars and 870 forks.
Trending score: 5.69; stars gained: +2,829; forks gained: +175.
Language: Python
Topics: agent, ai, anthropic, compression, context-engineering, context-window
Academic Research Skills for Claude Code: research → write → review → revise → finalize
GitHub repository with 27,484 stars and 2,256 forks.
Trending score: 5.52; stars gained: +1,079; forks gained: +89.
Language: Python
Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review
Learn it. Build it. Ship it for others.
GitHub repository with 28,622 stars and 4,680 forks.
Trending score: 5.32; stars gained: +1,261; forks gained: +238.
Language: Python
Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course
GitHub repository with 30,029 stars and 4,231 forks.
Trending score: 4.88; stars gained: +688; forks gained: +114.
Language: Python
An opinionated list of Python frameworks, libraries, tools, and resources
GitHub repository with 301,396 stars and 28,042 forks.
Trending score: 4.60; stars gained: +518; forks gained: +24.
Language: Python
Topics: awesome, python, collections, python-frameworks, python-libraries, python-tools
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 81,995 stars and 17,676 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
High performance object store for fast LLM Inference and GPU Training. Feed your GPUs at blazing fast speeds.
GitHub repository with 935 stars and 57 forks.
Trending score: 3.11; stars gained: +60; forks gained: +0.
Language: Rust
Topics: gpu, high-performance, rdma, rust, storage, throughput
TokenSpeed is a speed-of-light LLM inference engine.
GitHub repository with 1,366 stars and 141 forks.
Trending score: 1.86; stars gained: +6; forks gained: +2.
Language: Python
Topics: blackwell, deepseek, gpt-oss, kimi, lightseek, llm
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 28,859 stars and 6,348 forks.
Trending score: 1.72; stars gained: -55; forks gained: +18.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
GitHub repository with 13,807 stars and 2,440 forks.
Trending score: 1.18; stars gained: +16; forks gained: +7.
Language: Python
Topics: blackwell, cuda, llm-serving, moe, pytorch
cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.
GitHub repository with 840 stars and 177 forks.
Trending score: 0.69; stars gained: +4; forks gained: +2.
Language: Python
Topics: attention, blackwell, cuda-kernels, flash-attention, fp8, gemm