bencherdev/bencher
🐰 Bencher - Continuous Benchmarking
GitHub repository with 850 stars and 41 forks.
Language: Rust
Topics: benchmark, ci, performance, continuous-benchmarking, cd, ci-cd, code-quality, benchmarking
🐰 Bencher - Continuous Benchmarking
GitHub repository with 850 stars and 41 forks.
Language: Rust
Topics: benchmark, ci, performance, continuous-benchmarking, cd, ci-cd, code-quality, benchmarking
2026-06-05: 850 stars and 41 forks.
Quickly find bottlenecks in Rust - one profiler for CPU, time, memory, and async code.
GitHub repository with 1,529 stars and 45 forks.
Trending score: 0.60; stars gained: +3; forks gained: +0.
Language: Rust
Topics: allocations, benchmark, performance, rust, debugging, mpsc
High-precision, one-shot and consistent benchmarking framework/harness for Rust. All Valgrind tools at your fingertips.
GitHub repository with 274 stars and 23 forks.
Trending score: 0.05.
Language: Rust
Topics: benchmark, cargo, rust, bindings, callgrind, client-request
An enhanced tool for CodexApp, striving to make Codex better to use and more comfortable 一个CodexApp的增强工具,努力让Codex变得更好用更舒服
GitHub repository with 13,760 stars and 852 forks.
Trending score: 5.16; stars gained: +916; forks gained: +44.
Language: Rust
DeepSeek + MiMo coding agent in terminal
GitHub repository with 37,132 stars and 3,195 forks.
Trending score: 4.80; stars gained: +393; forks gained: +32.
Language: Rust
Topics: cli, deepseek, llm, rust, terminal, tui
Lightweight coding agent that runs in your terminal
GitHub repository with 88,832 stars and 13,052 forks.
Trending score: 4.58; stars gained: +326; forks gained: +48.
Language: Rust
Your Personal AI super intelligence. Private, Simple and extremely powerful.
GitHub repository with 30,826 stars and 2,977 forks.
Trending score: 4.37; stars gained: +332; forks gained: +50.
Language: Rust
Codebase intelligence for TypeScript and JavaScript. Free static layer: unused code, duplication, circular deps, complexity hotspots, architecture boundaries. Optional paid runtime layer: hot-path review and cold-path deletion evidence from real production traffic. Rust-native, sub-second, zero-config framework support.
GitHub repository with 3,058 stars and 94 forks.
Trending score: 4.05; stars gained: +346; forks gained: +16.
Language: Rust
Topics: cli, code-duplication, code-quality, codebase-intelligence, copy-paste-detection, dead-code
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
GitHub repository with 46,568 stars and 4,863 forks.
Trending score: 3.80; stars gained: +152; forks gained: +28.
Language: Rust
Topics: acp, ai, ai-agents, mcp
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training
GitHub repository with 517 stars and 82 forks.
Trending score: 3.00; stars gained: +33; forks gained: +4.
Language: TypeScript
Topics: agent, agents, ai, android, automation, benchmark
🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.
GitHub repository with 780 stars and 9 forks.
Trending score: 1.88; stars gained: +102; forks gained: +0.
Language: Python
Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
Minecraft-style voxel benchmark for comparing AI models (Arena + Sandbox)
GitHub repository with 244 stars and 17 forks.
Trending score: 1.14; stars gained: +13; forks gained: +0.
Language: TypeScript
Topics: ai, benchmark, llm, nlp, voxel, comparison-benchmarks
AMD Strix Halo local LLM guide: direct 100.0 t/s 30B Qwen MoE on Ryzen AI MAX+ 395 / Radeon 8060S. Setup, benchmarks, raw evidence.
GitHub repository with 91 stars and 4 forks.
Trending score: 0.98; stars gained: +7; forks gained: +0.
Language: Python
Topics: amd, benchmark, gfx1151, inference, llama-cpp, llm
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
GitHub repository with 1,273 stars and 328 forks.
Trending score: 0.92; stars gained: +7; forks gained: +1.
Language: Python
Topics: benchmark, llm, ai, language-model-agent, conversational-agents
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
GitHub repository with 7,061 stars and 784 forks.
Trending score: 0.91; stars gained: +4; forks gained: +1.
Language: Python
Topics: benchmark, chatgpt, evaluation, large-language-model, llama2, llama3