Mattral/Improving-LLM-Models-with-RLHF-PPO-DPO

A modular, production-grade framework for Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).

GitHub repository with 23 stars and 5 forks.

Language: Python

Topics: dpo, large-language-models, llm-alignment, machine-learning, policy-optimization, ppo, reinforcement-learning-from-human-feedback, reward-modeling, rlhf

Open provider repository

24h trend summary

Trending score 0.36, freshness score 0.00, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-15: 23 stars and 5 forks.

Similar repositories

1. Mattral/Improving-LLM-Models-with-RLHF-PPO-DPO

A modular, production-grade framework for Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).

GitHub repository with 23 stars and 5 forks.

Trending score: 0.36; stars gained: +0; forks gained: +0.

Language: Python

Topics: dpo, large-language-models, llm-alignment, machine-learning, policy-optimization, ppo
2. open-gitagent/shadowLM

A fine-tuning SDK — any open model, any harness, any method. 12 training methods behind one argument; pure-stdlib core.

GitHub repository with 6 stars and 0 forks.

Trending score: 0.10; stars gained: +0; forks gained: +0.

Language: Python

Topics: agents, dpo, fine-tuning, grpo, llm, lora
3. oumi-ai/oumi

Easily fine-tune, evaluate and deploy Gemma 4, Qwen3.5, Qwen3.6, gpt-oss, DeepSeek-R1, or any open source LLM / VLM!

GitHub repository with 9,312 stars and 777 forks.

Trending score: 0.10; stars gained: -1; forks gained: +0.

Language: Python

Topics: dpo, evaluation, fine-tuning, inference, llama, llms
4. Yog-Sotho/LLM-fine-tuner

Powerful no-code LLM fine-tuner: upload data → train → deploy in minutes. Unsloth 2-5× acceleration · QLoRA/DPO/RLHF/PPO/ORPO · Reward Model training · GGUF export · vLLM inference · BLEU/ROUGE/BERTScore · full CLI · Heretic Mode to unlock full model potential

GitHub repository with 26 stars and 3 forks.

Trending score: 0.09; stars gained: +0; forks gained: +0.

Language: Python

Topics: abliteration, ai, dpo, fine-tuning, gguf, gradio
5. lamenting-hawthorn/SkillLoop

Standalone self-improvement harness for agent traces, memory, skills, evaluation, and fine-tuning exports

GitHub repository with 5 stars and 1 forks.

Trending score: 0.09; stars gained: +0; forks gained: +0.

Language: Python

Topics: agent-memory, agents, ai-agent, dpo, fine-tuning, hermes-agent
6. gzhzk/alignsql

Qwen3-8B NL2SQL post-training from SFT to RL

GitHub repository with 5 stars and 0 forks.

Trending score: 0.09; stars gained: +0; forks gained: +0.

Language: Python

Topics: dpo, fine-tuning, llm, lora, nl2sql, qwen

Trending in Python

1. chopratejas/headroom

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

GitHub repository with 27,902 stars and 1,891 forks.

Trending score: 6.49; stars gained: +2,776; forks gained: +250.

Language: Python

Topics: agent, ai, anthropic, claude-code, compression, context-engineering
2. harry0703/MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

GitHub repository with 88,031 stars and 12,625 forks.

Trending score: 6.02; stars gained: +1,097; forks gained: +218.

Language: Python

Topics: ai, automation, chatgpt, moviepy, python, shortvideo
3. pewdiepie-archdaemon/odysseus

Self-hosted AI workspace.

GitHub repository with 71,374 stars and 9,095 forks.

Trending score: 5.98; stars gained: +834; forks gained: +140.

Language: Python
4. NousResearch/hermes-agent

The agent that grows with you

GitHub repository with 194,017 stars and 33,968 forks.

Trending score: 5.92; stars gained: +753; forks gained: +209.

Language: Python

Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude
5. NVIDIA/SkillSpector

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

GitHub repository with 5,654 stars and 427 forks.

Trending score: 5.61; stars gained: +874; forks gained: +76.

Language: Python
6. rohitg00/ai-engineering-from-scratch

Learn it. Build it. Ship it for others.

GitHub repository with 32,676 stars and 5,366 forks.

Trending score: 5.59; stars gained: +762; forks gained: +135.

Language: Python

Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course

Mattral/Improving-LLM-Models-with-RLHF-PPO-DPO

24h trend summary

Latest metric snapshot

Similar repositories

Trending in Python

Trending topic: dpo