IsThatYou/auto-bench-audit

Automated auditing pipeline for LLM and agent benchmarks — surfaces task ambiguity, environment conflicts, and evaluation bugs.

GitHub repository with 12 stars and 1 forks.

Language: HTML

Topics: agent-evaluation, agentic-ai, agents, ai-agents, auditing, benchmark, benchmarking, evaluation, large-language-models, llm

Open provider repository

24h trend summary

Trending score 0.48, activity score 0.00, stars gained +2, forks gained +0.

Latest metric snapshot

2026-06-04: 12 stars and 1 forks.

Similar repositories

  1. 1. IsThatYou/auto-bench-audit

    Automated auditing pipeline for LLM and agent benchmarks — surfaces task ambiguity, environment conflicts, and evaluation bugs.

    GitHub repository with 12 stars and 1 forks.

    Trending score: 0.48; stars gained: +2; forks gained: +0.

    Language: HTML

    Topics: agent-evaluation, agentic-ai, agents, ai-agents, auditing, benchmark

Trending in HTML

  1. 1. nexu-io/html-anything

    ✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · data report · Hyperframes) 🛡️ Sandboxed preview · 📤 1-click to WeChat / X / Zhihu / HTML / PNG 🔑 Zero API key — Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.

    GitHub repository with 6,047 stars and 588 forks.

    Trending score: 3.63; stars gained: +120; forks gained: +10.

    Language: HTML

    Topics: agent-skills, agentic, ai-agents, ai-design, ai-editor, byok

  2. 2. datawhalechina/Agent-Learning-Hub

    AI Agent 学习路线与资料库收集

    GitHub repository with 2,689 stars and 269 forks.

    Trending score: 3.48; stars gained: +96; forks gained: +8.

    Language: HTML

  3. 3. zarazhangrui/beautiful-html-templates

    A library of HTML slide templates designed so any coding agent can pick the right one and produce a beautiful deck on the user's behalf, automatically.

    GitHub repository with 2,517 stars and 231 forks.

    Trending score: 3.12; stars gained: +64; forks gained: +6.

    Language: HTML

  4. 4. f/prompts.chat

    f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.

    GitHub repository with 163,293 stars and 21,225 forks.

    Trending score: 3.02; stars gained: +44; forks gained: +6.

    Language: HTML

    Topics: chatgpt, ai, artificial-intelligence, awesome-list, chatgpt-prompts, claude

  5. 5. revfactory/harness

    A meta-skill that designs domain-specific agent teams, defines specialized agents, and generates the skills they use.

    GitHub repository with 5,091 stars and 686 forks.

    Trending score: 2.64; stars gained: +510; forks gained: +40.

    Language: HTML

    Topics: claude-code, claude-code-plugin, harness, harness-engineering

  6. 6. bernardohcrocha/persistia-for-claude-code

    Give Claude Code the persistent memory it was missing and turn it into your operational co-pilot.

    GitHub repository with 225 stars and 0 forks.

    Trending score: 2.61; stars gained: +222; forks gained: +0.

    Language: HTML

Trending topic: agent-evaluation

  1. 1. ifixai-ai/iFixAi

    The open-source diagnostic for AI misalignment. 32 tests across fabrication, manipulation, deception, unpredictability, and opacity. Provider-agnostic. Runs against OpenAI, Anthropic, Bedrock, Azure, Gemini, and more. Letter grade in under 5 minutes, content-addressed manifest for bit-identical replay. Built by iMe.

    GitHub repository with 466 stars and 90 forks.

    Trending score: 0.53; stars gained: +2; forks gained: +1.

    Language: Python

    Topics: ai, diagnostic-tool, misalignment, agent-evaluation, ai-alignment, ai-evaluation

  2. 2. IsThatYou/auto-bench-audit

    Automated auditing pipeline for LLM and agent benchmarks — surfaces task ambiguity, environment conflicts, and evaluation bugs.

    GitHub repository with 12 stars and 1 forks.

    Trending score: 0.48; stars gained: +2; forks gained: +0.

    Language: HTML

    Topics: agent-evaluation, agentic-ai, agents, ai-agents, auditing, benchmark

  3. 3. ALEX-nlp/OpenSkillEval

    OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

    GitHub repository with 6 stars and 0 forks.

    Trending score: 0.32; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: agent-evaluation, ai-agents, benchmark, llm-eval, skill-evaluation

  4. 4. dokimos-dev/dokimos

    LLM and agent evaluation for Java & Kotlin. Runs in JUnit and CI. Spring AI, LangChain4j, Koog.

    GitHub repository with 36 stars and 3 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: Java

    Topics: agent-evaluation, agentic-ai, evaluation, evaluation-framework, evaluation-metrics, java