CodSpeedHQ/codspeed
CodSpeed is the all-in-one performance testing toolkit. Optimize code performance and catch regressions early.
GitHub repository with 187 stars and 22 forks.
Language: Rust
Topics: benchmark, ci, performance, testing
CodSpeed is the all-in-one performance testing toolkit. Optimize code performance and catch regressions early.
GitHub repository with 187 stars and 22 forks.
Language: Rust
Topics: benchmark, ci, performance, testing
2026-06-04: 187 stars and 22 forks.
An enhanced tool for CodexApp, striving to make Codex better to use and more comfortable 一个CodexApp的增强工具,努力让Codex变得更好用更舒服
GitHub repository with 13,271 stars and 818 forks.
Trending score: 5.03; stars gained: +831; forks gained: +32.
Language: Rust
CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies
GitHub repository with 58,776 stars and 3,617 forks.
Trending score: 4.88; stars gained: +495; forks gained: +31.
Language: Rust
Topics: agentic-coding, ai-coding, anthropic, claude-code, cli, command-line-tool
Lightweight coding agent that runs in your terminal
GitHub repository with 88,618 stars and 13,012 forks.
Trending score: 4.61; stars gained: +336; forks gained: +70.
Language: Rust
15MB, lightweight, cross-platform database client. Supports MySQL, PostgreSQL, SQLite, Redis, MongoDB, DuckDB, ClickHouse, SQL Server and more.15MB,轻量级跨平台数据库客户端。支持 MySQL、PostgreSQL、SQLite、Redis、MongoDB、DuckDB、ClickHouse、SQL Server 等。
GitHub repository with 3,682 stars and 284 forks.
Trending score: 4.58; stars gained: +386; forks gained: +24.
Language: Rust
Topics: clickhouse, database, database-client, database-management, duckdb, gui
Your Personal AI super intelligence. Private, Simple and extremely powerful.
GitHub repository with 30,760 stars and 2,965 forks.
Trending score: 4.12; stars gained: +163; forks gained: +29.
Language: Rust
DeepSeek + MiMo coding agent in terminal
GitHub repository with 37,043 stars and 3,182 forks.
Trending score: 4.12; stars gained: +215; forks gained: +14.
Language: Rust
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training
GitHub repository with 498 stars and 79 forks.
Trending score: 3.43; stars gained: +84; forks gained: +10.
Language: TypeScript
Topics: agent, agents, ai, android, automation, benchmark
🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.
GitHub repository with 774 stars and 2 forks.
Trending score: 1.88; stars gained: +100; forks gained: +0.
Language: Python
Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
Minecraft-style voxel benchmark for comparing AI models (Arena + Sandbox)
GitHub repository with 243 stars and 17 forks.
Trending score: 1.23; stars gained: +19; forks gained: +1.
Language: TypeScript
Topics: ai, benchmark, llm, nlp, voxel, comparison-benchmarks
Benchmark for the quality of LLM-generated test suites — anti-fragility, rigor, mocking discipline, reuse — scored against human baselines, not coverage. Python, JS/TS, Go.
GitHub repository with 18 stars and 1 forks.
Trending score: 1.04; stars gained: +9; forks gained: +0.
Language: Python
Topics: benchmark, claude, code-quality, llm, mocha, pytest
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
GitHub repository with 1,273 stars and 328 forks.
Trending score: 0.92; stars gained: +7; forks gained: +1.
Language: Python
Topics: benchmark, llm, ai, language-model-agent, conversational-agents
Cinebench Advanced Edition Portable with extended test profiles, command-line runner, and comparison charts—full benchmark toolkit unlocked.
GitHub repository with 26 stars and 0 forks.
Trending score: 0.84; stars gained: +6; forks gained: +0.
Topics: advanced-edition, benchmark, cinebench, cpu, gpu, hardware