Files
axolotl/docs
Wing Lian f243c2186d RL/DPO (#935)
* ipo-dpo trainer

* fix missing abstract method

* chatml template, grad checkpointing kwargs support

* fix steps calc for RL and add dataloader kwargs

* wip to fix dpo and start ppo

* more fixes

* refactor to generalize map fn

* fix dataset loop and handle argilla pref dataset

* set training args

* load reference model on seperate gpu if more than one device

* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo

* fixes for rl training

* support for ipo from yaml

* set dpo training args from the config, add tests

* chore: lint

* set sequence_len for model in test

* add RLHF docs
2024-01-04 18:22:55 -05:00
..
2023-09-20 22:02:16 -04:00
2023-09-08 11:57:47 -04:00
2024-01-04 18:22:55 -05:00