PerMedCoE/observatory_benchmark

Description of the efforts towards having a comprehensive observatory of tools and their benchmarks

GitHub repository with 7 stars and 1 forks.

Language: Jupyter Notebook

Topics: benchmark, benchmarking

Open provider repository

Latest metric snapshot

2026-06-05: 7 stars and 1 forks.

Trending in Jupyter Notebook

1. NVIDIA/cosmos

NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

GitHub repository with 9,297 stars and 592 forks.

Trending score: 2.37; stars gained: +326; forks gained: +20.

Language: Jupyter Notebook
2. GoogleCloudPlatform/generative-ai

Sample code and notebooks for Generative AI on Google Cloud, with Gemini Enterprise Agent Platform

GitHub repository with 16,983 stars and 4,249 forks.

Trending score: 1.87; stars gained: +8; forks gained: +6.

Language: Jupyter Notebook

Topics: agents, gcp, gemini, gemini-api, gen-ai, generative-ai
3. DataTalksClub/llm-zoomcamp

LLM Zoomcamp - a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI system that answers questions about your knowledge base.

GitHub repository with 5,622 stars and 1,017 forks.

Trending score: 1.87; stars gained: +93; forks gained: +13.

Language: Jupyter Notebook
4. Biohub/esm

GitHub repository with 2,688 stars and 332 forks.

Trending score: 1.82; stars gained: +48; forks gained: +12.

Language: Jupyter Notebook
5. nerdai/llm-agents-from-scratch

Build LLM agents and multi-agent systems from scratch, with MCP, Skills, and A2A

GitHub repository with 131 stars and 45 forks.

Trending score: 1.70; stars gained: +32; forks gained: +14.

Language: Jupyter Notebook
6. DataTalksClub/data-engineering-zoomcamp

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

GitHub repository with 41,902 stars and 8,303 forks.

Trending score: 1.49; stars gained: +23; forks gained: +3.

Language: Jupyter Notebook

Topics: course, data-engineering, dbt, docker, free, kafka

Trending topic: benchmark

1. Purewhiter/mobilegym

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training

GitHub repository with 524 stars and 82 forks.

Trending score: 3.00; stars gained: +33; forks gained: +4.

Language: TypeScript

Topics: agent, agents, ai, android, automation, benchmark
2. VibeBench/VibeSearchBench

🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.

GitHub repository with 780 stars and 9 forks.

Trending score: 1.88; stars gained: +102; forks gained: +0.

Language: Python

Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
3. Ammaar-Alam/minebench

Minecraft-style voxel benchmark for comparing AI models (Arena + Sandbox)

GitHub repository with 245 stars and 17 forks.

Trending score: 1.14; stars gained: +13; forks gained: +0.

Language: TypeScript

Topics: ai, benchmark, llm, nlp, voxel, comparison-benchmarks
4. hogeheer499-commits/strix-halo-guide

AMD Strix Halo local LLM guide: direct 100.0 t/s 30B Qwen MoE on Ryzen AI MAX+ 395 / Radeon 8060S. Setup, benchmarks, raw evidence.

GitHub repository with 91 stars and 4 forks.

Trending score: 0.98; stars gained: +7; forks gained: +0.

Language: Python

Topics: amd, benchmark, gfx1151, inference, llama-cpp, llm
5. sierra-research/tau2-bench

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

GitHub repository with 1,273 stars and 328 forks.

Trending score: 0.92; stars gained: +7; forks gained: +1.

Language: Python

Topics: benchmark, llm, ai, language-model-agent, conversational-agents
6. ZeptoSeniorMoat/Cinebench-Advanced-Edition-Portable

Cinebench Advanced Edition Portable with extended test profiles, command-line runner, and comparison charts—full benchmark toolkit unlocked.

GitHub repository with 26 stars and 0 forks.

Trending score: 0.84; stars gained: +6; forks gained: +0.

Topics: advanced-edition, benchmark, cinebench, cpu, gpu, hardware