huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

GitHub repository with 2,451 stars and 489 forks.

Language: Python

Topics: evaluation, evaluation-framework, evaluation-metrics, huggingface

Open provider repository

24h trend summary

Trending score 0.78, freshness score 0.18, stars gained +4, forks gained +3.

Latest metric snapshot

2026-06-15: 2,451 stars and 489 forks.

Similar repositories

1. comet-ml/opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

GitHub repository with 19,653 stars and 1,522 forks.

Trending score: 3.44; stars gained: +58; forks gained: +4.

Language: Python

Topics: evaluation, hacktoberfest, hacktoberfest2025, langchain, llama-index, llm
2. mlflow/mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

GitHub repository with 26,532 stars and 5,845 forks.

Trending score: 2.94; stars gained: +20; forks gained: +9.

Language: Python

Topics: agentops, agents, ai, ai-governance, apache-spark, evaluation
3. open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

GitHub repository with 4,220 stars and 722 forks.

Trending score: 2.06; stars gained: +6; forks gained: +3.

Language: Python

Topics: chatgpt, claude, clip, computer-vision, evaluation, gemini
4. NVIDIA-NeMo/Gym

Evaluate and improve models and agents using environments

GitHub repository with 982 stars and 179 forks.

Trending score: 1.69; stars gained: +4; forks gained: +1.

Language: Python

Topics: reinforcement-learning, reinforcement-learning-environments, rl-environment, rl-training, gym, agents
5. lihouwenbin/ai-redteam-recursive-self-improvement

Domain-neutral AI red-team framework for recursive self-improvement governance

GitHub repository with 44 stars and 2 forks.

Trending score: 1.39; stars gained: +2; forks gained: +1.

Language: Python

Topics: agentic-ai, ai-safety, evaluation, governance, python, recursive-self-improvement
6. TIGER-AI-Lab/ClawBench

Open-source benchmark for browser AI agents on daily tasks.

GitHub repository with 393 stars and 22 forks.

Trending score: 1.37; stars gained: +2; forks gained: +0.

Language: Python

Topics: ai-agents, benchmark, browser-automation, browser-use, dataset, evaluation

Trending in Python

1. chopratejas/headroom

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

GitHub repository with 27,902 stars and 1,891 forks.

Trending score: 6.49; stars gained: +2,776; forks gained: +250.

Language: Python

Topics: agent, ai, anthropic, claude-code, compression, context-engineering
2. harry0703/MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

GitHub repository with 88,031 stars and 12,625 forks.

Trending score: 6.02; stars gained: +1,097; forks gained: +218.

Language: Python

Topics: ai, automation, chatgpt, moviepy, python, shortvideo
3. pewdiepie-archdaemon/odysseus

Self-hosted AI workspace.

GitHub repository with 71,392 stars and 9,098 forks.

Trending score: 5.98; stars gained: +834; forks gained: +140.

Language: Python
4. NousResearch/hermes-agent

The agent that grows with you

GitHub repository with 194,052 stars and 33,977 forks.

Trending score: 5.92; stars gained: +753; forks gained: +209.

Language: Python

Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
5. NVIDIA/SkillSpector

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

GitHub repository with 5,654 stars and 427 forks.

Trending score: 5.61; stars gained: +874; forks gained: +76.

Language: Python
6. rohitg00/ai-engineering-from-scratch

Learn it. Build it. Ship it for others.

GitHub repository with 32,676 stars and 5,366 forks.

Trending score: 5.59; stars gained: +762; forks gained: +135.

Language: Python

Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course

huggingface/lighteval

24h trend summary

Latest metric snapshot

Similar repositories

1. comet-ml/opik

2. mlflow/mlflow

3. open-compass/VLMEvalKit

4. NVIDIA-NeMo/Gym

5. lihouwenbin/ai-redteam-recursive-self-improvement

6. TIGER-AI-Lab/ClawBench

Trending in Python

1. chopratejas/headroom

2. harry0703/MoneyPrinterTurbo

3. pewdiepie-archdaemon/odysseus

4. NousResearch/hermes-agent

5. NVIDIA/SkillSpector

6. rohitg00/ai-engineering-from-scratch

Trending topic: evaluation

1. langfuse/langfuse

2. comet-ml/opik

3. promptfoo/promptfoo

4. Tencent/WeKnora

5. mlflow/mlflow

6. trpc-group/trpc-agent-go