Mattral/Improving-LLM-Models-with-RLHF-PPO-DPO

A modular, production-grade framework for Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).

GitHub repository with 23 stars and 5 forks.

Language: Python

Topics: dpo, large-language-models, llm-alignment, machine-learning, policy-optimization, ppo, reinforcement-learning-from-human-feedback, reward-modeling, rlhf

Open provider repository

24h trend summary

Trending score 0.36, freshness score 0.00, stars gained +0, forks gained +0.

Latest metric snapshot

2026-06-15: 23 stars and 5 forks.

Similar repositories

  1. 1. Mattral/Improving-LLM-Models-with-RLHF-PPO-DPO

    A modular, production-grade framework for Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).

    GitHub repository with 23 stars and 5 forks.

    Trending score: 0.36; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: dpo, large-language-models, llm-alignment, machine-learning, policy-optimization, ppo

  2. 2. open-gitagent/shadowLM

    A fine-tuning SDK — any open model, any harness, any method. 12 training methods behind one argument; pure-stdlib core.

    GitHub repository with 6 stars and 0 forks.

    Trending score: 0.10; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: agents, dpo, fine-tuning, grpo, llm, lora

  3. 3. oumi-ai/oumi

    Easily fine-tune, evaluate and deploy Gemma 4, Qwen3.5, Qwen3.6, gpt-oss, DeepSeek-R1, or any open source LLM / VLM!

    GitHub repository with 9,312 stars and 777 forks.

    Trending score: 0.10; stars gained: -1; forks gained: +0.

    Language: Python

    Topics: dpo, evaluation, fine-tuning, inference, llama, llms

  4. 4. Yog-Sotho/LLM-fine-tuner

    Powerful no-code LLM fine-tuner: upload data → train → deploy in minutes. Unsloth 2-5× acceleration · QLoRA/DPO/RLHF/PPO/ORPO · Reward Model training · GGUF export · vLLM inference · BLEU/ROUGE/BERTScore · full CLI · Heretic Mode to unlock full model potential

    GitHub repository with 26 stars and 3 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: abliteration, ai, dpo, fine-tuning, gguf, gradio

  5. 5. lamenting-hawthorn/SkillLoop

    Standalone self-improvement harness for agent traces, memory, skills, evaluation, and fine-tuning exports

    GitHub repository with 5 stars and 1 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: agent-memory, agents, ai-agent, dpo, fine-tuning, hermes-agent

  6. 6. gzhzk/alignsql

    Qwen3-8B NL2SQL post-training from SFT to RL

    GitHub repository with 5 stars and 0 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: dpo, fine-tuning, llm, lora, nl2sql, qwen

Trending in Python

  1. 1. chopratejas/headroom

    Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

    GitHub repository with 27,902 stars and 1,891 forks.

    Trending score: 6.49; stars gained: +2,776; forks gained: +250.

    Language: Python

    Topics: agent, ai, anthropic, claude-code, compression, context-engineering

  2. 2. harry0703/MoneyPrinterTurbo

    利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

    GitHub repository with 88,031 stars and 12,625 forks.

    Trending score: 6.02; stars gained: +1,097; forks gained: +218.

    Language: Python

    Topics: ai, automation, chatgpt, moviepy, python, shortvideo

  3. 3. pewdiepie-archdaemon/odysseus

    Self-hosted AI workspace.

    GitHub repository with 71,374 stars and 9,095 forks.

    Trending score: 5.98; stars gained: +834; forks gained: +140.

    Language: Python

  4. 4. NousResearch/hermes-agent

    The agent that grows with you

    GitHub repository with 194,017 stars and 33,968 forks.

    Trending score: 5.92; stars gained: +753; forks gained: +209.

    Language: Python

    Topics: ai, ai-agent, ai-agents, anthropic, chatgpt, claude

  5. 5. NVIDIA/SkillSpector

    Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

    GitHub repository with 5,654 stars and 427 forks.

    Trending score: 5.61; stars gained: +874; forks gained: +76.

    Language: Python

  6. 6. rohitg00/ai-engineering-from-scratch

    Learn it. Build it. Ship it for others.

    GitHub repository with 32,676 stars and 5,366 forks.

    Trending score: 5.59; stars gained: +762; forks gained: +135.

    Language: Python

    Topics: agents, ai, ai-agents, ai-engineering, computer-vision, course

Trending topic: dpo

  1. 1. Mattral/Improving-LLM-Models-with-RLHF-PPO-DPO

    A modular, production-grade framework for Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO).

    GitHub repository with 23 stars and 5 forks.

    Trending score: 0.36; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: dpo, large-language-models, llm-alignment, machine-learning, policy-optimization, ppo

  2. 2. open-gitagent/shadowLM

    A fine-tuning SDK — any open model, any harness, any method. 12 training methods behind one argument; pure-stdlib core.

    GitHub repository with 6 stars and 0 forks.

    Trending score: 0.10; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: agents, dpo, fine-tuning, grpo, llm, lora

  3. 3. oumi-ai/oumi

    Easily fine-tune, evaluate and deploy Gemma 4, Qwen3.5, Qwen3.6, gpt-oss, DeepSeek-R1, or any open source LLM / VLM!

    GitHub repository with 9,312 stars and 777 forks.

    Trending score: 0.10; stars gained: -1; forks gained: +0.

    Language: Python

    Topics: dpo, evaluation, fine-tuning, inference, llama, llms

  4. 4. Yog-Sotho/LLM-fine-tuner

    Powerful no-code LLM fine-tuner: upload data → train → deploy in minutes. Unsloth 2-5× acceleration · QLoRA/DPO/RLHF/PPO/ORPO · Reward Model training · GGUF export · vLLM inference · BLEU/ROUGE/BERTScore · full CLI · Heretic Mode to unlock full model potential

    GitHub repository with 26 stars and 3 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: abliteration, ai, dpo, fine-tuning, gguf, gradio

  5. 5. lamenting-hawthorn/SkillLoop

    Standalone self-improvement harness for agent traces, memory, skills, evaluation, and fine-tuning exports

    GitHub repository with 5 stars and 1 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: agent-memory, agents, ai-agent, dpo, fine-tuning, hermes-agent

  6. 6. gzhzk/alignsql

    Qwen3-8B NL2SQL post-training from SFT to RL

    GitHub repository with 5 stars and 0 forks.

    Trending score: 0.09; stars gained: +0; forks gained: +0.

    Language: Python

    Topics: dpo, fine-tuning, llm, lora, nl2sql, qwen