DPO support loss types (#3566)

* Support loss_type/loss_weights DPO

* Validate dpo loss type/weights only set for dpo

* Tests: Update ipo tests to use new path

* Docs: Update docs for new ipo path

* PR fixes - typo/validation

* PR nit - warning

* chore: fix warnings arg

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
This commit is contained in:
Andrew Wu
2026-04-23 05:25:28 +01:00
committed by GitHub
parent 7420fd4de6
commit 90090fa9e8
8 changed files with 117 additions and 5 deletions

View File

@@ -320,8 +320,10 @@ The input format is a simple JSON input with customizable fields based on the ab
As IPO is just DPO with a different loss function, all supported dataset formats for [DPO](#dpo) are also supported for IPO.
```yaml
rl: ipo
rl: dpo
dpo_loss_type: ["ipo"]
```
*Note:* Passing `rl: ipo` directly is still supported, but will soon be deprecated.
### ORPO