benjaminzwhite/reasoning-models
Experiments with reasoning models, training techniques, papers
GitHub repository with 30 stars and 4 forks.
Topics: benchmarks, cot, deepseek, deepseek-r1, llm, models, o1, papers, reasoning, reinforcement-learning
Experiments with reasoning models, training techniques, papers
GitHub repository with 30 stars and 4 forks.
Topics: benchmarks, cot, deepseek, deepseek-r1, llm, models, o1, papers, reasoning, reinforcement-learning
2026-06-05: 30 stars and 4 forks.
Evaluate and improve models and agents using environments
GitHub repository with 957 stars and 170 forks.
Trending score: 2.41; stars gained: +16; forks gained: +5.
Language: Python
Topics: agents, benchmarks, environments, evaluation, gym, llm
TUI and CLI for browsing AI models, benchmarks, coding agents, and statuses for AI providers.
GitHub repository with 435 stars and 17 forks.
Trending score: 0.86; stars gained: +2; forks gained: +0.
Language: Rust
Topics: ai, anamolyco, artificial-analysis, benchmarks, claude-code, codex
Production-grade design context for AI coding workflows. Extract a real design system from any URL — colors, typography, spacing, breakpoints — as a portable DESIGN.md.
GitHub repository with 39 stars and 3 forks.
Trending score: 0.28; stars gained: +1; forks gained: +0.
Topics: ai-coding, ai-coding-tools, ai-context, benchmarks, claude, claude-code
📊 Benchmark Comparison of Packages with Runtime Validation and TypeScript Support
GitHub repository with 824 stars and 88 forks.
Trending score: 0.18; stars gained: +0; forks gained: +0.
Language: TypeScript
Topics: typescript, types, benchmarks, validation, benchmark, json
Prime number projects in 100+ programming languages, to compare their speed - and their programmer's cleverness
GitHub repository with 2,969 stars and 599 forks.
Trending score: 0.05; stars gained: +0; forks gained: +0.
Language: C
Topics: programming-languages, drag-race, primes, primesieve, benchmarks, docker
L4BYR1NTH: ARTIFEX Safety Evals
GitHub repository with 6 stars and 0 forks.
Trending score: 0.05; stars gained: +0; forks gained: +0.
Language: Jupyter Notebook
Topics: agentic, ai, benchmarking, benchmarks, evals, evaluation-framework
Evaluate and improve models and agents using environments
GitHub repository with 957 stars and 170 forks.
Trending score: 2.41; stars gained: +16; forks gained: +5.
Language: Python
Topics: agents, benchmarks, environments, evaluation, gym, llm
TUI and CLI for browsing AI models, benchmarks, coding agents, and statuses for AI providers.
GitHub repository with 435 stars and 17 forks.
Trending score: 0.86; stars gained: +2; forks gained: +0.
Language: Rust
Topics: ai, anamolyco, artificial-analysis, benchmarks, claude-code, codex
Production-grade design context for AI coding workflows. Extract a real design system from any URL — colors, typography, spacing, breakpoints — as a portable DESIGN.md.
GitHub repository with 39 stars and 3 forks.
Trending score: 0.28; stars gained: +1; forks gained: +0.
Topics: ai-coding, ai-coding-tools, ai-context, benchmarks, claude, claude-code
📊 Benchmark Comparison of Packages with Runtime Validation and TypeScript Support
GitHub repository with 824 stars and 88 forks.
Trending score: 0.18; stars gained: +0; forks gained: +0.
Language: TypeScript
Topics: typescript, types, benchmarks, validation, benchmark, json
Prime number projects in 100+ programming languages, to compare their speed - and their programmer's cleverness
GitHub repository with 2,969 stars and 599 forks.
Trending score: 0.05; stars gained: +0; forks gained: +0.
Language: C
Topics: programming-languages, drag-race, primes, primesieve, benchmarks, docker
L4BYR1NTH: ARTIFEX Safety Evals
GitHub repository with 6 stars and 0 forks.
Trending score: 0.05; stars gained: +0; forks gained: +0.
Language: Jupyter Notebook
Topics: agentic, ai, benchmarking, benchmarks, evals, evaluation-framework