lizhiyao/oh-my-knowledge

Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.

GitHub repository with 11 stars and 2 forks.

Language: TypeScript

Topics: agent-evaluation, ai, benchmark, bootstrap-ci, claude, claude-code, evaluation-as-code, evaluation-framework, knowledge-engineering, krippendorff-alpha

Open provider repository

24h trend summary

Trending score 0.37, freshness score 0.94, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-15: 11 stars and 2 forks.

Similar repositories

  1. 1. lizhiyao/oh-my-knowledge

    Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.

    GitHub repository with 11 stars and 2 forks.

    Trending score: 0.37; stars gained: +0; forks gained: +0.

    Language: TypeScript

    Topics: agent-evaluation, ai, benchmark, bootstrap-ci, claude, claude-code

  2. 2. iris-eval/mcp-server

    The agent eval standard for MCP — score output quality, catch safety failures, enforce cost budgets

    GitHub repository with 7 stars and 3 forks.

    Trending score: 0.23; stars gained: +0; forks gained: +0.

    Language: TypeScript

    Topics: agent-evaluation, ai-agent, claude, eval, llm, mcp

Trending in TypeScript

  1. 1. iptv-org/iptv

    Collection of publicly available IPTV channels from all over the world

    GitHub repository with 121,738 stars and 6,527 forks.

    Trending score: 6.11; stars gained: +2,935; forks gained: +171.

    Language: TypeScript

    Topics: iptv, m3u, playlist, tv, streams

  2. 2. colbymchenry/codegraph

    Pre-indexed code knowledge graph, auto syncs on code changes, for Claude Code, Codex, Gemini, Cursor, OpenCode, AntiGravity, Kiro, and Hermes Agent — fewer tokens, fewer tool calls, 100% local

    GitHub repository with 49,436 stars and 3,025 forks.

    Trending score: 5.69; stars gained: +779; forks gained: +60.

    Language: TypeScript

  3. 3. nexu-io/open-design

    🎨 Local-first, open-source Claude Design alternative. 🖥️ Native desktop app. ⚡ 259+ Skills · ✨ 142+ Design Systems 🖼️ Web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sandboxed preview · HTML/PDF/PPTX/MP4 export 🤖 Claude Code / OpenClaw / Codex / Cursor / OpenCode / Qwen / Copilot / Hermes / Kimi & 17+ CLIs.

    GitHub repository with 65,206 stars and 7,301 forks.

    Trending score: 5.65; stars gained: +790; forks gained: +117.

    Language: TypeScript

    Topics: agent-skills, ai-agents, ai-design, byok, claude-code-for-design, claude-design

  4. 4. refactoringhq/tolaria

    Desktop app to manage markdown knowledge bases

    GitHub repository with 16,327 stars and 1,116 forks.

    Trending score: 5.27; stars gained: +469; forks gained: +36.

    Language: TypeScript

  5. 5. heygen-com/hyperframes

    Write HTML. Render video. Built for agents.

    GitHub repository with 27,797 stars and 2,614 forks.

    Trending score: 5.27; stars gained: +516; forks gained: +59.

    Language: TypeScript

    Topics: ai, animation, ffmpeg, framework, gsap, html

  6. 6. anomalyco/opencode

    The open source coding agent.

    GitHub repository with 174,671 stars and 21,137 forks.

    Trending score: 5.24; stars gained: +351; forks gained: +79.

    Language: TypeScript

Trending topic: agent-evaluation

  1. 1. coze-dev/coze-loop

    Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.

    GitHub repository with 5,522 stars and 764 forks.

    Trending score: 1.76; stars gained: +8; forks gained: +0.

    Language: Go

    Topics: agent, agent-evaluation, agent-observability, agentops, ai, coze

  2. 2. samarailly51-pixel/claimpilot-harness

    Crash-test insurance claim AI agents before production.

    GitHub repository with 79 stars and 2 forks.

    Trending score: 1.52; stars gained: +21; forks gained: +1.

    Language: Python

    Topics: agent-evaluation, ai-agents, insurance, llm-evals, prompt-injection, python

  3. 3. ALEX-nlp/OpenSkillEval

    OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

    GitHub repository with 12 stars and 0 forks.

    Trending score: 1.02; stars gained: +3; forks gained: +0.

    Language: Python

    Topics: agent-evaluation, ai-agents, benchmark, llm-eval, skill-evaluation

  4. 4. mozilla-ai/any-agent

    A single interface to use and evaluate different agent frameworks

    GitHub repository with 1,176 stars and 94 forks.

    Trending score: 0.92; stars gained: +2; forks gained: +0.

    Language: Python

    Topics: agent-evaluation, agents, ai, a2a, mcp

  5. 5. ifixai-ai/iFixAi

    Catch your AI's mistakes and blind spots before your customers or regulators do. iFixAi runs 45 inspections, 32 graded core plus 13 extended for frontier risks like sabotage, sandbagging, and oversight evasion. It returns a letter grade in under 5 minutes. Industry and model agnostic.

    GitHub repository with 479 stars and 92 forks.

    Trending score: 0.76; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: ai, diagnostic-tool, agent-evaluation, ai-alignment, ai-evaluation, ai-governance

  6. 6. Forsy-AI/forsy-trace-skill

    Open skill for capturing AI agent work as structured traces.

    GitHub repository with 89 stars and 10 forks.

    Trending score: 0.71; stars gained: -1; forks gained: +0.

    Language: Python

    Topics: agent-evaluation, agent-traces, agent-workflows, ai-agents, llm-agents, post-training