abundant-ai/oddish
Run Harbor tasks in the cloud
GitHub repository with 8 stars and 2 forks.
Language: Python
Topics: eval, llm, rl
Run Harbor tasks in the cloud
GitHub repository with 8 stars and 2 forks.
Language: Python
Topics: eval, llm, rl
Trending score 0.53, activity score 0.05, stars gained +1, forks gained +0.
2026-06-13: 8 stars and 2 forks.
A small, powerful, open-source CLI coding agent that works with open models.
GitHub repository with 210 stars and 17 forks.
Trending score: 1.05; stars gained: +2; forks gained: +1.
Language: Python
Topics: agent, ai, cli, code, eval, huggingface
Run Harbor tasks in the cloud
GitHub repository with 8 stars and 2 forks.
Trending score: 0.53; stars gained: +1; forks gained: +0.
Language: Python
Topics: eval, llm, rl
⚽🤖 11 frontier LLMs predicted the entire 2026 World Cup — frozen before kickoff. Live leaderboard: Brier score, bracket points & Polymarket ROI.
GitHub repository with 9 stars and 9 forks.
Trending score: 0.53; stars gained: +2; forks gained: +1.
Language: Python
Topics: ai, benchmark, claude, deepseek, eval, forecasting
CLI for agent builders to benchmark & chaos test your AI Agents. Text, Voice, Code supported.
GitHub repository with 38 stars and 4 forks.
Trending score: 0.26; stars gained: +0; forks gained: +0.
Language: Python
Topics: agent, benchmark, chaos-engineering, eval, failure-injection, ai-agent
Multi-modal benchmark for measuring sensitive-information disclosure in computer-use agents
GitHub repository with 5 stars and 0 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Python
Topics: benchmark, computer-use, computer-use-agent, eval, evaluation
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
GitHub repository with 86,823 stars and 12,389 forks.
Trending score: 5.94; stars gained: +1,787; forks gained: +253.
Language: Python
Topics: ai, automation, chatgpt, moviepy, python, shortvideo
AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
GitHub repository with 40,614 stars and 3,271 forks.
Trending score: 5.82; stars gained: +1,312; forks gained: +87.
Language: Python
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
GitHub repository with 24,986 stars and 1,636 forks.
Trending score: 5.73; stars gained: +2,844; forks gained: +202.
Language: Python
Topics: agent, ai, anthropic, claude-code, compression, context-engineering
Self-hosted AI workspace.
GitHub repository with 69,531 stars and 8,790 forks.
Trending score: 5.70; stars gained: +951; forks gained: +165.
Language: Python
The agent that grows with you
GitHub repository with 192,170 stars and 33,504 forks.
Trending score: 5.48; stars gained: +990; forks gained: +282.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
Academic Research Skills for Claude Code: research → write → review → revise → finalize
GitHub repository with 30,710 stars and 2,535 forks.
Trending score: 5.48; stars gained: +775; forks gained: +54.
Language: Python
Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review
A small, powerful, open-source CLI coding agent that works with open models.
GitHub repository with 210 stars and 17 forks.
Trending score: 1.05; stars gained: +2; forks gained: +1.
Language: Python
Topics: agent, ai, cli, code, eval, huggingface
"Unit tests" for your agent skills
GitHub repository with 515 stars and 39 forks.
Trending score: 0.54; stars gained: +2; forks gained: +1.
Language: TypeScript
Topics: agent, claude-code, codex, eval, gemini-cli, skill
Run Harbor tasks in the cloud
GitHub repository with 8 stars and 2 forks.
Trending score: 0.53; stars gained: +1; forks gained: +0.
Language: Python
Topics: eval, llm, rl
⚽🤖 11 frontier LLMs predicted the entire 2026 World Cup — frozen before kickoff. Live leaderboard: Brier score, bracket points & Polymarket ROI.
GitHub repository with 9 stars and 9 forks.
Trending score: 0.53; stars gained: +2; forks gained: +1.
Language: Python
Topics: ai, benchmark, claude, deepseek, eval, forecasting
Validate and retry LLM outputs for ruby_llm. Describe the JSON response you expect, fall back to a stronger model when the cheaper one fails the rules, and gate CI on regressions — all as one contract object per step.
GitHub repository with 31 stars and 0 forks.
Trending score: 0.45; stars gained: +1; forks gained: +0.
Language: Ruby
Topics: ai, anthropic, cost-tracking, eval, llm, model-comparison
CLI for agent builders to benchmark & chaos test your AI Agents. Text, Voice, Code supported.
GitHub repository with 38 stars and 4 forks.
Trending score: 0.26; stars gained: +0; forks gained: +0.
Language: Python
Topics: agent, benchmark, chaos-engineering, eval, failure-injection, ai-agent