axolotl/requirements.txt at NanoCode012-patch-1

Files

Wing Lian f243c2186d RL/DPO (#935 )

* ipo-dpo trainer

* fix missing abstract method

* chatml template, grad checkpointing kwargs support

* fix steps calc for RL and add dataloader kwargs

* wip to fix dpo and start ppo

* more fixes

* refactor to generalize map fn

* fix dataset loop and handle argilla pref dataset

* set training args

* load reference model on seperate gpu if more than one device

* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo

* fixes for rl training

* support for ipo from yaml

* set dpo training args from the config, add tests

* chore: lint

* set sequence_len for model in test

* add RLHF docs

2024-01-04 18:22:55 -05:00

694 B

Raw Permalink Blame History

View Raw

694 B Raw Permalink Blame History

694 B

Raw Permalink Blame History