zhjai/agent-arena
Evidence-first multi-agent debate skill: get a second opinion by pitting Codex × Claude Code (or GLM/DeepSeek/Qwen) to independently review, red-team & judge high-stakes code and architecture decisions.
GitHub repository with 24 stars and 5 forks.
Topics: agent-arena, ai-agents, claude-code, codex, hermes-agent, llm-as-judge, multi-agent, openai-codex, openclaw, opencode