more dpo fixes for dataset loading and docs (#1185) [skip ci]
* more dpo fixes for dataset loading and docs * preprocess dpo datasets
This commit is contained in:
10
docs/rlhf.md
10
docs/rlhf.md
@@ -34,6 +34,16 @@ datasets:
|
||||
rl: ipo
|
||||
```
|
||||
|
||||
#### Using local dataset files
|
||||
```yaml
|
||||
datasets:
|
||||
- ds_type: json
|
||||
data_files:
|
||||
- orca_rlhf.jsonl
|
||||
split: train
|
||||
type: chatml.intel
|
||||
```
|
||||
|
||||
#### Trl autounwrap for peft
|
||||
|
||||
Trl supports autounwrapping peft models, so that a ref model does not need to be additionally loaded, leading to less VRAM needed. This is on by default. To turn it off, pass the following config.
|
||||
|
||||
Reference in New Issue
Block a user