avnlp/llm-finetuning
Advanced LLM fine-tuning techniques: SFT (LoRA, QLoRA, DoRA, P-/Prefix-Tuning), GRPO, DPO, ORPO, KTO & PPO; composable correctness/format rewards + LLM-as-a-Judge evals (DeepEval, Evidently AI) across math, multi-hop, medical & general QA on Llama 3, Mistral, Phi-4, Gemma & Qwen3. Built on TRL, PEFT & Unsloth.
GitHub repository with 7 stars and 3 forks.
Language: Python
Topics: dpo, fine-tuning, grpo, kto, lora, orpo, p-tuning, peft, ppo, qlora