axolotl

Files

Wing Lian f243c2186d RL/DPO (#935 )

* ipo-dpo trainer

* fix missing abstract method

* chatml template, grad checkpointing kwargs support

* fix steps calc for RL and add dataloader kwargs

* wip to fix dpo and start ppo

* more fixes

* refactor to generalize map fn

* fix dataset loop and handle argilla pref dataset

* set training args

* load reference model on seperate gpu if more than one device

* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo

* fixes for rl training

* support for ipo from yaml

* set dpo training args from the config, add tests

* chore: lint

* set sequence_len for model in test

* add RLHF docs

2024-01-04 18:22:55 -05:00

faq.md

feat(doc): add dummyoptim faq fix (#802 )

2023-10-29 23:06:06 +09:00

multi-node.md

Create multi-node.md (#613 )

2023-09-20 22:02:16 -04:00

multipack.md

fix pytorch 2.1.0 build, add multipack docs (#722 )

2023-10-13 08:57:28 -04:00

nccl.md

Adding NCCL Timeout Guide (#536 )

2023-09-08 11:57:47 -04:00

rlhf.md

RL/DPO (#935 )

2024-01-04 18:22:55 -05:00