maximhq/maxim-cookbooks

Maxim is an end-to-end AI evaluation and observability platform that empowers modern AI teams to ship agents with quality, reliability, and speed.

GitHub repository with 16 stars and 9 forks.

Language: Jupyter Notebook

Topics: evaluation, evaluation-framework, genai, observability

Open provider repository

Latest metric snapshot

2026-06-05: 16 stars and 9 forks.

Trending in Jupyter Notebook

  1. 1. GoogleCloudPlatform/generative-ai

    Sample code and notebooks for Generative AI on Google Cloud, with Gemini Enterprise Agent Platform

    GitHub repository with 16,986 stars and 4,249 forks.

    Trending score: 1.87; stars gained: +8; forks gained: +6.

    Language: Jupyter Notebook

    Topics: generative-ai, llm, vertex-ai, langchain, gemini, gemini-api

  2. 2. DataTalksClub/llm-zoomcamp

    LLM Zoomcamp - a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI system that answers questions about your knowledge base.

    GitHub repository with 5,624 stars and 1,018 forks.

    Trending score: 1.87; stars gained: +93; forks gained: +13.

    Language: Jupyter Notebook

  3. 3. Biohub/esm

    GitHub repository with 2,691 stars and 332 forks.

    Trending score: 1.82; stars gained: +48; forks gained: +12.

    Language: Jupyter Notebook

  4. 4. nerdai/llm-agents-from-scratch

    Build LLM agents and multi-agent systems from scratch, with MCP, Skills, and A2A

    GitHub repository with 131 stars and 45 forks.

    Trending score: 1.70; stars gained: +32; forks gained: +14.

    Language: Jupyter Notebook

  5. 5. DataTalksClub/data-engineering-zoomcamp

    Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

    GitHub repository with 41,902 stars and 8,303 forks.

    Trending score: 1.49; stars gained: +23; forks gained: +3.

    Language: Jupyter Notebook

    Topics: course, data-engineering, dbt, docker, free, kafka

  6. 6. facebookresearch/VLM3

    Official implementation of paper "VLM³: Vision Language Models Are Native 3D Learners".

    GitHub repository with 218 stars and 9 forks.

    Trending score: 1.46; stars gained: +33; forks gained: +0.

    Language: Jupyter Notebook

    Topics: 3d-foundation-model, camera-pose-estimation, depth-estimation, image-matching, large-language-models, object-level-3d

Trending topic: evaluation

  1. 1. langwatch/langwatch

    The platform for LLM evaluations and AI agent testing

    GitHub repository with 3,288 stars and 320 forks.

    Trending score: 2.05; stars gained: +6; forks gained: +0.

    Language: TypeScript

    Topics: ai, analytics, datasets, dspy, evaluation, gpt

  2. 2. crimeacs/auto-improve

    GAN-style self-improvement loop for any text artifact: mutate, grade with a SEPARATE model, keep only verified wins (pairwise-judged), revert the rest. The git history is the improvement log.

    GitHub repository with 18 stars and 1 forks.

    Trending score: 0.48; stars gained: +2; forks gained: +0.

    Language: Python

    Topics: ai-agents, developer-tools, evaluation, gan, gemini, generative-ai

  3. 3. meituan-longcat/WBench

    WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

    GitHub repository with 118 stars and 3 forks.

    Trending score: 0.47; stars gained: +2; forks gained: +0.

    Language: Python

    Topics: evaluation, worldmodel

  4. 4. outsourc-e/bench-loop

    Local-first CLI for benchmarking LLMs on real hardware — quality, speed, reliability, and a real multi-turn agent loop.

    GitHub repository with 32 stars and 6 forks.

    Trending score: 0.46; stars gained: +2; forks gained: +0.

    Language: Python

    Topics: agent, benchmark, cli, evaluation, llm, local-llm

  5. 5. karlmehta/trustmodel

    Score any AI for trust — Eval, Monitor, Govern. 10 trust dimensions, one free API key (5 credits / $500).

    GitHub repository with 7 stars and 0 forks.

    Trending score: 0.33; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: ai, ai-safety, compliance, evaluation, fairness, guardrails

  6. 6. strands-agents/evals

    A comprehensive evaluation framework for AI agents and LLM applications.

    GitHub repository with 133 stars and 36 forks.

    Trending score: 0.33; stars gained: +1; forks gained: +0.

    Language: Python

    Topics: agentic, agentic-ai, ai, evaluation, machine-learning, python