RL/DPO (#935)

* ipo-dpo trainer

* fix missing abstract method

* chatml template, grad checkpointing kwargs support

* fix steps calc for RL and add dataloader kwargs

* wip to fix dpo and start ppo

* more fixes

* refactor to generalize map fn

* fix dataset loop and handle argilla pref dataset

* set training args

* load reference model on seperate gpu if more than one device

* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo

* fixes for rl training

* support for ipo from yaml

* set dpo training args from the config, add tests

* chore: lint

* set sequence_len for model in test

* add RLHF docs

This commit is contained in:

Wing Lian

2024-01-04 18:21:25 -05:00

parent 59b2d302c8

commit f243c2186d

11 changed files with 388 additions and 6 deletions

2

requirements.txt

View File

@@ -37,3 +37,5 @@ tensorboard
 s3fs
 gcsfs
 # adlfs
 trl @ git+https://github.com/huggingface/trl.git@main

RL/DPO (#935)

2 requirements.txt Unescape Escape View File

2

requirements.txt

View File