confident-ai/deepeval
The LLM Evaluation Framework
GitHub repository with 16,133 stars and 1,526 forks.
Language: Python
Topics: evaluation-framework, evaluation-metrics, llm-evaluation, llm-evaluation-framework, llm-evaluation-metrics, python
The LLM Evaluation Framework
GitHub repository with 16,133 stars and 1,526 forks.
Language: Python
Topics: evaluation-framework, evaluation-metrics, llm-evaluation, llm-evaluation-framework, llm-evaluation-metrics, python
2026-06-13: 16,133 stars and 1,526 forks.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
GitHub repository with 2,444 stars and 486 forks.
Trending score: 0.66; stars gained: +3; forks gained: +2.
Language: Python
Topics: evaluation, evaluation-framework, evaluation-metrics, huggingface
Flexible and Pluggable Serving Engine for Diffusion LLMs
GitHub repository with 71 stars and 15 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Python
Topics: dllm, evaluation-framework, inference-engine
AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
GitHub repository with 40,614 stars and 3,271 forks.
Trending score: 5.82; stars gained: +1,312; forks gained: +87.
Language: Python
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
GitHub repository with 25,425 stars and 1,676 forks.
Trending score: 5.73; stars gained: +2,844; forks gained: +202.
Language: Python
Topics: agent, ai, anthropic, compression, context-engineering, context-window
Self-hosted AI workspace.
GitHub repository with 69,712 stars and 8,823 forks.
Trending score: 5.70; stars gained: +951; forks gained: +165.
Language: Python
The agent that grows with you
GitHub repository with 192,363 stars and 33,537 forks.
Trending score: 5.48; stars gained: +990; forks gained: +282.
Language: Python
Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.
GitHub repository with 66,467 stars and 6,719 forks.
Trending score: 5.25; stars gained: +1,314; forks gained: +109.
Language: Python
Topics: antigravity, claude-code, codex, gemini, graphrag, knowledge-graph
AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images · by Hugo He
GitHub repository with 27,112 stars and 2,418 forks.
Trending score: 5.10; stars gained: +903; forks gained: +61.
Language: Python
Topics: ai-agent, aippt, office, powerpoint, powerpoint-generation, ppt
LLM and agent evaluation for Java & Kotlin. Runs in JUnit and CI. Spring AI, LangChain4j, Koog, Embabel, and any LLM client.
GitHub repository with 39 stars and 3 forks.
Trending score: 0.80; stars gained: +1; forks gained: +0.
Language: Java
Topics: agent-evaluation, evaluation-framework, evaluation-metrics, java, langchain4j, llm-evaluation
Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.
GitHub repository with 11 stars and 2 forks.
Trending score: 0.77; stars gained: +2; forks gained: +0.
Language: TypeScript
Topics: ai, benchmark, claude, knowledge-engineering, llm, prompt-engineering
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
GitHub repository with 2,444 stars and 486 forks.
Trending score: 0.66; stars gained: +3; forks gained: +2.
Language: Python
Topics: evaluation, evaluation-framework, evaluation-metrics, huggingface
BENCHMARKS WITHOUT BORDERS
GitHub repository with 6 stars and 0 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Jupyter Notebook
Topics: agentic, ai, benchmarking, benchmarks, evals, evaluation-framework
Flexible and Pluggable Serving Engine for Diffusion LLMs
GitHub repository with 71 stars and 15 forks.
Trending score: 0.04; stars gained: +0; forks gained: +0.
Language: Python
Topics: dllm, evaluation-framework, inference-engine