benjaminzwhite/reasoning-models

Experiments with reasoning models, training techniques, papers

GitHub repository with 30 stars and 4 forks.

Topics: benchmarks, cot, deepseek, deepseek-r1, llm, models, o1, papers, reasoning, reinforcement-learning

Open provider repository

Latest metric snapshot

2026-06-05: 30 stars and 4 forks.

Similar repositories

  1. 1. NVIDIA-NeMo/Gym

    Evaluate and improve models and agents using environments

    GitHub repository with 957 stars and 170 forks.

    Trending score: 2.41; stars gained: +16; forks gained: +5.

    Language: Python

    Topics: agents, benchmarks, environments, evaluation, gym, llm

  2. 2. reyamira/models

    TUI and CLI for browsing AI models, benchmarks, coding agents, and statuses for AI providers.

    GitHub repository with 435 stars and 17 forks.

    Trending score: 0.86; stars gained: +2; forks gained: +0.

    Language: Rust

    Topics: ai, anamolyco, artificial-analysis, benchmarks, claude-code, codex

  3. 3. adityarajdigital/designmd

    Production-grade design context for AI coding workflows. Extract a real design system from any URL — colors, typography, spacing, breakpoints — as a portable DESIGN.md.

    GitHub repository with 39 stars and 3 forks.

    Trending score: 0.28; stars gained: +1; forks gained: +0.

    Topics: ai-coding, ai-coding-tools, ai-context, benchmarks, claude, claude-code

  4. 4. moltar/typescript-runtime-type-benchmarks

    📊 Benchmark Comparison of Packages with Runtime Validation and TypeScript Support

    GitHub repository with 824 stars and 88 forks.

    Trending score: 0.18; stars gained: +0; forks gained: +0.

    Language: TypeScript

    Topics: typescript, types, benchmarks, validation, benchmark, json

  5. 5. PlummersSoftwareLLC/Primes

    Prime number projects in 100+ programming languages, to compare their speed - and their programmer's cleverness

    GitHub repository with 2,969 stars and 599 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: C

    Topics: programming-languages, drag-race, primes, primesieve, benchmarks, docker

  6. 6. Tuesdaythe13th/L4BYR1NTHagentevaluations

    L4BYR1NTH: ARTIFEX Safety Evals

    GitHub repository with 6 stars and 0 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: Jupyter Notebook

    Topics: agentic, ai, benchmarking, benchmarks, evals, evaluation-framework

Trending topic: benchmarks

  1. 1. NVIDIA-NeMo/Gym

    Evaluate and improve models and agents using environments

    GitHub repository with 957 stars and 170 forks.

    Trending score: 2.41; stars gained: +16; forks gained: +5.

    Language: Python

    Topics: agents, benchmarks, environments, evaluation, gym, llm

  2. 2. reyamira/models

    TUI and CLI for browsing AI models, benchmarks, coding agents, and statuses for AI providers.

    GitHub repository with 435 stars and 17 forks.

    Trending score: 0.86; stars gained: +2; forks gained: +0.

    Language: Rust

    Topics: ai, anamolyco, artificial-analysis, benchmarks, claude-code, codex

  3. 3. adityarajdigital/designmd

    Production-grade design context for AI coding workflows. Extract a real design system from any URL — colors, typography, spacing, breakpoints — as a portable DESIGN.md.

    GitHub repository with 39 stars and 3 forks.

    Trending score: 0.28; stars gained: +1; forks gained: +0.

    Topics: ai-coding, ai-coding-tools, ai-context, benchmarks, claude, claude-code

  4. 4. moltar/typescript-runtime-type-benchmarks

    📊 Benchmark Comparison of Packages with Runtime Validation and TypeScript Support

    GitHub repository with 824 stars and 88 forks.

    Trending score: 0.18; stars gained: +0; forks gained: +0.

    Language: TypeScript

    Topics: typescript, types, benchmarks, validation, benchmark, json

  5. 5. PlummersSoftwareLLC/Primes

    Prime number projects in 100+ programming languages, to compare their speed - and their programmer's cleverness

    GitHub repository with 2,969 stars and 599 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: C

    Topics: programming-languages, drag-race, primes, primesieve, benchmarks, docker

  6. 6. Tuesdaythe13th/L4BYR1NTHagentevaluations

    L4BYR1NTH: ARTIFEX Safety Evals

    GitHub repository with 6 stars and 0 forks.

    Trending score: 0.05; stars gained: +0; forks gained: +0.

    Language: Jupyter Notebook

    Topics: agentic, ai, benchmarking, benchmarks, evals, evaluation-framework