feat: add Group Relative Policy Optimization (GPRO) to RLHF documentation

This commit is contained in:
mhenrhcsen
2025-06-01 22:42:03 +02:00
committed by mhenrichsen
parent 94219f6ee8
commit 68788e419e

View File

@@ -16,6 +16,7 @@ feedback. Various methods include, but not limited to:
- [Identity Preference Optimization (IPO)](#ipo)
- [Kahneman-Tversky Optimization (KTO)](#kto)
- [Odds Ratio Preference Optimization (ORPO)](#orpo)
- [Group Relative Policy Optimization (GPRO)](#grpo)
- Proximal Policy Optimization (PPO) (not yet supported in axolotl, if you're interested in contributing, please reach out!)