ORPO (#1419)

* orpo trainer * rl handling for orpo * support for remove_unused_columns * orpo fixes * fix loader for orpo * chore: lint * fix default for remove_unused_columns * roll ORPO into the main AxolotlTrainer so it can be compatible with some of the other techniques like relora * better handling of system message for orpo * revert system prompt changes for chat templtes * no need for else condition * split dataset parsing into it's own component
2024-03-18 13:10:00 -04:00
parent e8c8ea64b3
commit 2ea70ebbd8
14 changed files with 451 additions and 24 deletions
--- a/docs/rlhf.md
+++ b/docs/rlhf.md
@@ -34,6 +34,21 @@ datasets:
 rl: ipo
 ```

+#### ORPO
+
+Paper: https://arxiv.org/abs/2403.07691
+
+```yaml
+rl: orpo
+orpo_alpha: 0.1
+remove_unused_columns: false
+
+chat_template: chatml
+datasets:
+  - path: argilla/ultrafeedback-binarized-preferences-cleaned
+    type: orpo.chat_template
+```
+
 #### Using local dataset files
 ```yaml
 datasets: