benjaminzwhite/reasoning-models

Experiments with reasoning models, training techniques, papers

GitHub repository with 30 stars and 4 forks.

Topics: benchmarks, cot, deepseek, deepseek-r1, llm, models, o1, papers, reasoning, reinforcement-learning

Open provider repository

Latest metric snapshot

2026-06-05: 30 stars and 4 forks.

Similar repositories

1. NVIDIA-NeMo/Gym

Evaluate and improve models and agents using environments

GitHub repository with 957 stars and 170 forks.

Trending score: 2.41; stars gained: +16; forks gained: +5.

Language: Python

Topics: agents, benchmarks, environments, evaluation, gym, llm
2. reyamira/models

TUI and CLI for browsing AI models, benchmarks, coding agents, and statuses for AI providers.

GitHub repository with 435 stars and 17 forks.

Trending score: 0.86; stars gained: +2; forks gained: +0.

Language: Rust

Topics: ai, anamolyco, artificial-analysis, benchmarks, claude-code, codex
3. adityarajdigital/designmd

Production-grade design context for AI coding workflows. Extract a real design system from any URL — colors, typography, spacing, breakpoints — as a portable DESIGN.md.

GitHub repository with 39 stars and 3 forks.

Trending score: 0.28; stars gained: +1; forks gained: +0.

Topics: ai-coding, ai-coding-tools, ai-context, benchmarks, claude, claude-code
4. moltar/typescript-runtime-type-benchmarks

📊 Benchmark Comparison of Packages with Runtime Validation and TypeScript Support

GitHub repository with 824 stars and 88 forks.

Trending score: 0.18; stars gained: +0; forks gained: +0.

Language: TypeScript

Topics: typescript, types, benchmarks, validation, benchmark, json
5. PlummersSoftwareLLC/Primes

Prime number projects in 100+ programming languages, to compare their speed - and their programmer's cleverness

GitHub repository with 2,969 stars and 599 forks.

Trending score: 0.05; stars gained: +0; forks gained: +0.

Language: C

Topics: programming-languages, drag-race, primes, primesieve, benchmarks, docker
6. Tuesdaythe13th/L4BYR1NTHagentevaluations

L4BYR1NTH: ARTIFEX Safety Evals

GitHub repository with 6 stars and 0 forks.

Trending score: 0.05; stars gained: +0; forks gained: +0.

Language: Jupyter Notebook

Topics: agentic, ai, benchmarking, benchmarks, evals, evaluation-framework

benjaminzwhite/reasoning-models

Latest metric snapshot

Similar repositories

Trending topic: benchmarks