Sphere-AI-Lab/orbit
Stable and Efficient Reinforcement Learning for Trillion-Parameter LLMs
GitHub repository with 127 stars and 6 forks.
Language: Python
Topics: cuda, low-precision, peft, reinforcement-learning, transformers
Stable and Efficient Reinforcement Learning for Trillion-Parameter LLMs
GitHub repository with 127 stars and 6 forks.
Language: Python
Topics: cuda, low-precision, peft, reinforcement-learning, transformers
Trending score 0.30, activity score 0.01, stars gained +1, forks gained +0.
2026-06-05: 127 stars and 6 forks.
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 81,948 stars and 17,658 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
GitHub repository with 5,102 stars and 541 forks.
Trending score: 2.51; stars gained: +11; forks gained: +1.
Language: Python
Topics: ascend, cuda, deepseek, distributed-inference, genai, inference
SGLang is a high-performance serving framework for large language models and multimodal models.
GitHub repository with 28,875 stars and 6,336 forks.
Trending score: 1.72; stars gained: -55; forks gained: +18.
Language: Python
Topics: attention, blackwell, cuda, deepseek, diffusion, glm
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
GitHub repository with 13,805 stars and 2,437 forks.
Trending score: 1.18; stars gained: +16; forks gained: +7.
Language: Python
Topics: blackwell, cuda, llm-serving, moe, pytorch
FlashInfer: Kernel Library for LLM Serving
GitHub repository with 5,744 stars and 1,026 forks.
Trending score: 1.16; stars gained: +15; forks gained: +8.
Language: Python
Topics: attention, cuda, distributed-inference, gpu, jit, large-large-models
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
GitHub repository with 3,378 stars and 738 forks.
Trending score: 1.10; stars gained: +1; forks gained: +3.
Language: Python
Topics: cuda, deep-learning, fp4, fp8, gpu, jax
The agent that grows with you
GitHub repository with 180,988 stars and 31,047 forks.
Trending score: 5.95; stars gained: +1,867; forks gained: +361.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
GitHub repository with 12,420 stars and 807 forks.
Trending score: 5.69; stars gained: +2,829; forks gained: +175.
Language: Python
Topics: agent, ai, anthropic, claude-code, compression, context-engineering
Academic Research Skills for Claude Code: research → write → review → revise → finalize
GitHub repository with 27,211 stars and 2,239 forks.
Trending score: 5.52; stars gained: +1,079; forks gained: +89.
Language: Python
Topics: academic-writing, ai-research, claude, claude-code, literature-review, peer-review
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
GitHub repository with 140,042 stars and 20,111 forks.
Trending score: 5.04; stars gained: +317; forks gained: +58.
Language: Python
Topics: ollama, ollama-webui, llm, webui, self-hosted, llm-ui
LLM驱动的 A/H/美股智能分析:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.
GitHub repository with 40,750 stars and 38,939 forks.
Trending score: 4.88; stars gained: +836; forks gained: +443.
Language: Python
Topics: a-stock, ai-agent, aigc, llm, quant, quantitative-finance
GitHub repository with 29,960 stars and 4,217 forks.
Trending score: 4.88; stars gained: +688; forks gained: +114.
Language: Python
A high-throughput and memory-efficient inference and serving engine for LLMs
GitHub repository with 81,948 stars and 17,658 forks.
Trending score: 3.75; stars gained: +79; forks gained: +46.
Language: Python
Topics: amd, blackwell, cuda, deepseek, deepseek-v3, gpt
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
GitHub repository with 5,102 stars and 541 forks.
Trending score: 2.51; stars gained: +11; forks gained: +1.
Language: Python
Topics: ascend, cuda, deepseek, distributed-inference, genai, inference
Fast LLM speculative inference server for consumer hardware.
GitHub repository with 2,330 stars and 217 forks.
Trending score: 2.31; stars gained: +17; forks gained: +3.
Language: C++
Topics: kernel, llama-cpp, local-ai, nvidia-cuda, qwen, rtx3090
Making it easier to work with shaders
GitHub repository with 5,348 stars and 451 forks.
Trending score: 2.08; stars gained: +4; forks gained: +2.
Language: C++
Topics: shaders, hlsl, glsl, d3d12, vulkan, cuda
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
GitHub repository with 1,494 stars and 480 forks.
Trending score: 1.82; stars gained: +7; forks gained: +5.
Language: C++
Topics: accelerator, ai, cuda, deepseek, gpu, img-gen
Real-time 3D full-body reconstruction from a single camera, Multiperson BVH output, Pure C++ runtime, ONNX + ggml, 70-joint skeleton with hands.
GitHub repository with 472 stars and 61 forks.
Trending score: 1.78; stars gained: +2; forks gained: +1.
Language: C
Topics: 3d-human-pose, bvh, computer-vision, cpp, cuda, ggml