more dpo fixes for dataset loading and docs (#1185) [skip ci]

* more dpo fixes for dataset loading and docs * preprocess dpo datasets
2024-01-24 14:23:55 -05:00
parent d85d4942cf
commit 5bce45f800
4 changed files with 73 additions and 4 deletions
--- a/docs/rlhf.md
+++ b/docs/rlhf.md
@@ -34,6 +34,16 @@ datasets:
 rl: ipo
 ```

+#### Using local dataset files
+```yaml
+datasets:
+  - ds_type: json
+    data_files:
+      - orca_rlhf.jsonl
+    split: train
+    type: chatml.intel
+```
+
 #### Trl autounwrap for peft

 Trl supports autounwrapping peft models, so that a ref model does not need to be additionally loaded, leading to less VRAM needed. This is on by default. To turn it off, pass the following config.