IsThatYou/auto-bench-audit
Automated auditing pipeline for LLM and agent benchmarks — surfaces task ambiguity, environment conflicts, and evaluation bugs.
GitHub repository with 12 stars and 1 forks.
Language: HTML
Topics: agent-evaluation, agentic-ai, agents, ai-agents, auditing, benchmark, benchmarking, evaluation, large-language-models, llm