lizhiyao/oh-my-knowledge
Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.
GitHub repository with 11 stars and 2 forks.
Language: TypeScript
Topics: agent-evaluation, ai, benchmark, bootstrap-ci, claude, claude-code, evaluation-as-code, evaluation-framework, knowledge-engineering, krippendorff-alpha