THU-BPM/RLCSD
Source code of paper "RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation"
GitHub repository with 27 stars and 1 forks.
Language: Python
Topics: large-language-models, llm, on-policy-distillation, opsd, reinforcement-learning, self-distillation