Dylsimple60/RLHF_learn
🤖 Enhance reinforcement learning stability and efficiency with advanced algorithms like TRPO, PPO, DPO, GRPO, DAPO, and GSPO for optimized policy training.
GitHub repository with 6 stars and 0 forks.
Language: Python
Topics: ai-safety, attention-mechanisms, datasets, deep-learning, deep-reinforcement-learning, gpt, human-feedback, large-language-models, openai-o1, python