hyeonsangjeon/gdpval-realworks

Benchmark LLMs on real professional tasks, not academic puzzles. YAML-driven experiment pipeline + live React dashboard for GDPVal Gold Subset (220 tasks across 11 industries).

GitHub repository with 14 stars and 2 forks.

Language: Python

Topics: ai-evaluation, anthropic, azure-openai, benchmark-automation, code-interpreter, dashboard, evaluation, github-actions, gpt-5, huggingface

Open provider repository

24h trend summary

Trending score 0.09, activity score 0.38, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-13: 14 stars and 2 forks.

Similar repositories

1. huggingface/cadgenbench

A benchmark for AI-driven CAD generation and editing

GitHub repository with 62 stars and 5 forks.

Trending score: 0.94; stars gained: +8; forks gained: +2.

Language: Python

Topics: 3d, ai-evaluation, benchmark, cad, huggingface, image-to-3d
2. Neal006/memorylens

The open-source benchmark for LLM memory decay. Measure how Naive, RAG, Chunked RAG, Cascading, and SummaryMemory degrade over 100 conversation turns. Ebbinghaus forgetting curves, 5-provider LLM eval, multi-seed CI. No API key needed.

GitHub repository with 7 stars and 2 forks.

Trending score: 0.15; stars gained: +0; forks gained: +0.

Language: Python

Topics: ai-evaluation, benchmarking, chatbot, conversation-memory, ebbinghaus, evaluation
3. hyeonsangjeon/gdpval-realworks

Benchmark LLMs on real professional tasks, not academic puzzles. YAML-driven experiment pipeline + live React dashboard for GDPVal Gold Subset (220 tasks across 11 industries).

GitHub repository with 14 stars and 2 forks.

Trending score: 0.09; stars gained: +0; forks gained: +0.

Language: Python

Topics: ai-evaluation, anthropic, azure-openai, benchmark-automation, code-interpreter, dashboard
4. NoesisVision/nasde-toolkit

CLI for benchmarks & evals of AI coding agents — on tasks you already understand, using your Claude / Codex / Gemini individual subscriptions or API keys.

GitHub repository with 10 stars and 0 forks.

Trending score: 0.04; stars gained: +0; forks gained: +0.

Language: Python

Topics: agent-benchmark, agent-evaluation, ai-coding-agents, ai-evaluation, claude-code, claude-skills
5. vishwanathakuthota/openvals

Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

GitHub repository with 7 stars and 6 forks.

Trending score: 0.03; stars gained: -1; forks gained: +0.

Language: Python

Topics: ai-agents, ai-evaluation, ai-evaluation-framework, ai-quality, ai-reliability, ai-safety

Trending in Python

1. mvanhorn/last30days-skill

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

GitHub repository with 40,614 stars and 3,271 forks.

Trending score: 5.82; stars gained: +1,312; forks gained: +87.

Language: Python
2. chopratejas/headroom

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

GitHub repository with 24,986 stars and 1,636 forks.

Trending score: 5.73; stars gained: +2,844; forks gained: +202.

Language: Python

Topics: agent, ai, anthropic, claude-code, compression, context-engineering
3. pewdiepie-archdaemon/odysseus

Self-hosted AI workspace.

GitHub repository with 69,622 stars and 8,812 forks.

Trending score: 5.70; stars gained: +951; forks gained: +165.

Language: Python
4. NousResearch/hermes-agent

The agent that grows with you

GitHub repository with 192,291 stars and 33,524 forks.

Trending score: 5.48; stars gained: +990; forks gained: +282.

Language: Python

Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
5. safishamsi/graphify

AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

GitHub repository with 66,406 stars and 6,716 forks.

Trending score: 5.25; stars gained: +1,314; forks gained: +109.

Language: Python

Topics: claude-code, graphrag, knowledge-graph, codex, openclaw, skills
6. hugohe3/ppt-master

AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images · by Hugo He

GitHub repository with 27,093 stars and 2,416 forks.

Trending score: 5.10; stars gained: +903; forks gained: +61.

Language: Python

Topics: ai-agent, powerpoint, pptx, presentation, office, slides

hyeonsangjeon/gdpval-realworks

24h trend summary

Latest metric snapshot

Similar repositories

Trending in Python

Trending topic: ai-evaluation