mercator-ocean/oceanbench

Benchmark evaluating ocean forecasting systems against reference datasets and observations.

GitHub repository with 42 stars and 3 forks.

Language: Jupyter Notebook

Topics: benchmark, machine-learning, oceanbench, oceanography, ocean-forecasting, operational-oceanography, copernicus-marine-service, edito, digital-twin-ocean

Open provider repository

Latest metric snapshot

2026-06-15: 42 stars and 3 forks.

Trending in Jupyter Notebook

1. DataTalksClub/llm-zoomcamp

LLM Zoomcamp - a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI system that answers questions about your knowledge base.

GitHub repository with 6,354 stars and 1,130 forks.

Trending score: 3.79; stars gained: +50; forks gained: +8.

Language: Jupyter Notebook
2. mcarfagno/mpc_python

A simple MPC controller for path tracking implemented in python

GitHub repository with 434 stars and 70 forks.

Trending score: 3.72; stars gained: +132; forks gained: +17.

Language: Jupyter Notebook
3. facebookresearch/dinov3

Reference PyTorch implementation and models for DINOv3

GitHub repository with 10,678 stars and 876 forks.

Trending score: 3.14; stars gained: +70; forks gained: +6.

Language: Jupyter Notebook
4. rasbt/reasoning-from-scratch

Implement a reasoning LLM in PyTorch from scratch, step by step

GitHub repository with 4,508 stars and 662 forks.

Trending score: 2.79; stars gained: +24; forks gained: +1.

Language: Jupyter Notebook

Topics: ai, artificial-intelligence, deep-learning, large-language-models, llms, machine-learning
5. openai/openai-cookbook

Examples and guides for using the OpenAI API

GitHub repository with 74,172 stars and 12,554 forks.

Trending score: 2.74; stars gained: +17; forks gained: +5.

Language: Jupyter Notebook

Topics: chatgpt, gpt-4, openai, openai-api
6. gepa-ai/gepa

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

GitHub repository with 5,154 stars and 431 forks.

Trending score: 2.72; stars gained: +18; forks gained: +6.

Language: Jupyter Notebook

Trending topic: benchmark

1. VibeBench/VibeSearchBench

🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.

GitHub repository with 1,008 stars and 63 forks.

Trending score: 3.33; stars gained: +50; forks gained: +37.

Language: Python

Topics: agentic-ai, benchmark, llm, proactive-agent, search, search-agent
2. wuyoscar/Internal-Safety-Collapse

Internal Safety Collapse (ISC): Turning the LLM or an AI Agent into a sensitive data generator.

GitHub repository with 865 stars and 142 forks.

Trending score: 2.51; stars gained: +11; forks gained: +3.

Language: Python

Topics: agent-safety, ai-safety, benchmark, jailbreak, large-language-models, llm-safety
3. Purewhiter/mobilegym

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training

GitHub repository with 621 stars and 98 forks.

Trending score: 2.50; stars gained: +12; forks gained: +1.

Language: TypeScript

Topics: benchmark, mobile-agent, reinforcement-learning, vlm, agents, gym
4. hogeheer499-commits/strix-halo-guide

Complete guide to running large language models locally on AMD Strix Halo / Ryzen AI MAX+ 395 with Radeon 8060S (gfx1151) and 96GB/128GB unified memory. Covers BIOS config, Ubuntu/kernel setup, Ollama, llama.cpp Vulkan/RADV, ROCm/HIP, vLLM, and 70B/120B GGUF evidence.

GitHub repository with 143 stars and 6 forks.

Trending score: 1.97; stars gained: +9; forks gained: +0.

Language: Python

Topics: amd, benchmark, gfx1151, llama-cpp, llm, local-llm
5. SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmark Research Platform Kimi K2.6, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 & soon™ TPUv6e/v7/Trainium2/3

GitHub repository with 1,099 stars and 195 forks.

Trending score: 1.96; stars gained: +4; forks gained: +1.

Language: Shell

Topics: ai, benchmark, llm, pytorch, sglang, vllm
6. open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

GitHub repository with 7,086 stars and 788 forks.

Trending score: 1.55; stars gained: +3; forks gained: +0.

Language: Python

Topics: evaluation, benchmark, large-language-model, chatgpt, llm, llama2