PerMedCoE/observatory_benchmark
Description of the efforts towards having a comprehensive observatory of tools and their benchmarks
GitHub repository with 7 stars and 1 forks.
Language: Jupyter Notebook
Topics: benchmark, benchmarking
Description of the efforts towards having a comprehensive observatory of tools and their benchmarks
GitHub repository with 7 stars and 1 forks.
Language: Jupyter Notebook
Topics: benchmark, benchmarking
2026-06-05: 7 stars and 1 forks.
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
GitHub repository with 9,297 stars and 592 forks.
Trending score: 2.37; stars gained: +326; forks gained: +20.
Language: Jupyter Notebook
Sample code and notebooks for Generative AI on Google Cloud, with Gemini Enterprise Agent Platform
GitHub repository with 16,983 stars and 4,249 forks.
Trending score: 1.87; stars gained: +8; forks gained: +6.
Language: Jupyter Notebook
Topics: agents, gcp, gemini, gemini-api, gen-ai, generative-ai
LLM Zoomcamp - a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI system that answers questions about your knowledge base.
GitHub repository with 5,622 stars and 1,017 forks.
Trending score: 1.87; stars gained: +93; forks gained: +13.
Language: Jupyter Notebook
GitHub repository with 2,688 stars and 332 forks.
Trending score: 1.82; stars gained: +48; forks gained: +12.
Language: Jupyter Notebook
Build LLM agents and multi-agent systems from scratch, with MCP, Skills, and A2A
GitHub repository with 131 stars and 45 forks.
Trending score: 1.70; stars gained: +32; forks gained: +14.
Language: Jupyter Notebook
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
GitHub repository with 41,902 stars and 8,303 forks.
Trending score: 1.49; stars gained: +23; forks gained: +3.
Language: Jupyter Notebook
Topics: course, data-engineering, dbt, docker, free, kafka
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training
GitHub repository with 524 stars and 82 forks.
Trending score: 3.00; stars gained: +33; forks gained: +4.
Language: TypeScript
Topics: agent, agents, ai, android, automation, benchmark
🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.
GitHub repository with 780 stars and 9 forks.
Trending score: 1.88; stars gained: +102; forks gained: +0.
Language: Python
Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
Minecraft-style voxel benchmark for comparing AI models (Arena + Sandbox)
GitHub repository with 245 stars and 17 forks.
Trending score: 1.14; stars gained: +13; forks gained: +0.
Language: TypeScript
Topics: ai, benchmark, llm, nlp, voxel, comparison-benchmarks
AMD Strix Halo local LLM guide: direct 100.0 t/s 30B Qwen MoE on Ryzen AI MAX+ 395 / Radeon 8060S. Setup, benchmarks, raw evidence.
GitHub repository with 91 stars and 4 forks.
Trending score: 0.98; stars gained: +7; forks gained: +0.
Language: Python
Topics: amd, benchmark, gfx1151, inference, llama-cpp, llm
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
GitHub repository with 1,273 stars and 328 forks.
Trending score: 0.92; stars gained: +7; forks gained: +1.
Language: Python
Topics: benchmark, llm, ai, language-model-agent, conversational-agents
Cinebench Advanced Edition Portable with extended test profiles, command-line runner, and comparison charts—full benchmark toolkit unlocked.
GitHub repository with 26 stars and 0 forks.
Trending score: 0.84; stars gained: +6; forks gained: +0.
Topics: advanced-edition, benchmark, cinebench, cpu, gpu, hardware