abundant-ai/oddish

Run Harbor tasks in the cloud

GitHub repository with 8 stars and 2 forks.

Language: Python

Topics: eval, llm, rl

Open provider repository

24h trend summary

Trending score 0.53, activity score 0.05, stars gained +1, forks gained +0.

Latest metric snapshot

2026-06-13: 8 stars and 2 forks.

Similar repositories

  1. 1. Swival/swival

    A small, powerful, open-source CLI coding agent that works with open models.

    GitHub repository with 210 stars and 17 forks.

    Trending score: 1.05; stars gained: +2; forks gained: +1.

    Language: Python

    Topics: agent, ai, cli, code, eval, huggingface

  2. 2. abundant-ai/oddish

    Run Harbor tasks in the cloud

    GitHub repository with 8 stars and 2 forks.

    Trending score: 0.53; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: eval, llm, rl

  3. 3. mverab/WorldCupBench

    ⚽🤖 11 frontier LLMs predicted the entire 2026 World Cup — frozen before kickoff. Live leaderboard: Brier score, bracket points & Polymarket ROI.

    GitHub repository with 9 stars and 9 forks.

    Trending score: 0.53; stars gained: +2; forks gained: +1.

    Language: Python

    Topics: ai, benchmark, claude, deepseek, eval, forecasting

  4. 4. Corbell-AI/evalmonkey

    CLI for agent builders to benchmark & chaos test your AI Agents. Text, Voice, Code supported.

    GitHub repository with 38 stars and 4 forks.

    Trending score: 0.26; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: agent, benchmark, chaos-engineering, eval, failure-injection, ai-agent

  5. 5. screenpipe/screenleak

    Multi-modal benchmark for measuring sensitive-information disclosure in computer-use agents

    GitHub repository with 5 stars and 0 forks.

    Trending score: 0.04; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: benchmark, computer-use, computer-use-agent, eval, evaluation

Trending in Python

  1. 1. harry0703/MoneyPrinterTurbo

    利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

    GitHub repository with 86,823 stars and 12,389 forks.

    Trending score: 5.94; stars gained: +1,787; forks gained: +253.

    Language: Python

    Topics: ai, automation, chatgpt, moviepy, python, shortvideo

  2. 2. mvanhorn/last30days-skill

    AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

    GitHub repository with 40,614 stars and 3,271 forks.

    Trending score: 5.82; stars gained: +1,312; forks gained: +87.

    Language: Python

  3. 3. chopratejas/headroom

    Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

    GitHub repository with 24,986 stars and 1,636 forks.

    Trending score: 5.73; stars gained: +2,844; forks gained: +202.

    Language: Python

    Topics: agent, ai, anthropic, claude-code, compression, context-engineering

  4. 4. pewdiepie-archdaemon/odysseus

    Self-hosted AI workspace.

    GitHub repository with 69,531 stars and 8,790 forks.

    Trending score: 5.70; stars gained: +951; forks gained: +165.

    Language: Python

  5. 5. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 192,170 stars and 33,504 forks.

    Trending score: 5.48; stars gained: +990; forks gained: +282.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  6. 6. Imbad0202/academic-research-skills

    Academic Research Skills for Claude Code: research → write → review → revise → finalize

    GitHub repository with 30,710 stars and 2,535 forks.

    Trending score: 5.48; stars gained: +775; forks gained: +54.

    Language: Python

    Topics: academic-pipeline, academic-writing, ai-research, claude, claude-code, literature-review

Trending topic: eval

  1. 1. Swival/swival

    A small, powerful, open-source CLI coding agent that works with open models.

    GitHub repository with 210 stars and 17 forks.

    Trending score: 1.05; stars gained: +2; forks gained: +1.

    Language: Python

    Topics: agent, ai, cli, code, eval, huggingface

  2. 2. mgechev/skillgrade

    "Unit tests" for your agent skills

    GitHub repository with 515 stars and 39 forks.

    Trending score: 0.54; stars gained: +2; forks gained: +1.

    Language: TypeScript

    Topics: agent, claude-code, codex, eval, gemini-cli, skill

  3. 3. abundant-ai/oddish

    Run Harbor tasks in the cloud

    GitHub repository with 8 stars and 2 forks.

    Trending score: 0.53; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: eval, llm, rl

  4. 4. mverab/WorldCupBench

    ⚽🤖 11 frontier LLMs predicted the entire 2026 World Cup — frozen before kickoff. Live leaderboard: Brier score, bracket points & Polymarket ROI.

    GitHub repository with 9 stars and 9 forks.

    Trending score: 0.53; stars gained: +2; forks gained: +1.

    Language: Python

    Topics: ai, benchmark, claude, deepseek, eval, forecasting

  5. 5. justi/ruby_llm-contract

    Validate and retry LLM outputs for ruby_llm. Describe the JSON response you expect, fall back to a stronger model when the cheaper one fails the rules, and gate CI on regressions — all as one contract object per step.

    GitHub repository with 31 stars and 0 forks.

    Trending score: 0.45; stars gained: +1; forks gained: +0.

    Language: Ruby

    Topics: ai, anthropic, cost-tracking, eval, llm, model-comparison

  6. 6. Corbell-AI/evalmonkey

    CLI for agent builders to benchmark & chaos test your AI Agents. Text, Voice, Code supported.

    GitHub repository with 38 stars and 4 forks.

    Trending score: 0.26; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: agent, benchmark, chaos-engineering, eval, failure-injection, ai-agent