axolotl

Author	SHA1	Message	Date
Hamel Husain	52c83d30bf	Update rlhf.md (#1237 ) [skip ci]	2024-01-31 17:27:35 -05:00
Wing Lian	5bce45f800	more dpo fixes for dataset loading and docs (#1185 ) [skip ci] * more dpo fixes for dataset loading and docs * preprocess dpo datasets	2024-01-24 14:23:55 -05:00
Aleksey Korshuk	dc051b861d	Update rlhf.md (#1178 ) [skip ci]	2024-01-23 15:54:51 -05:00
NanoCode012	b432889256	feat: enable trl's autounwrap (#1060 ) * feat: test trl's autounwrap * fix: add check for adapter * feat: add config to disable autounwrap * chore: fix lint	2024-01-11 08:43:41 -05:00
Wing Lian	f243c2186d	RL/DPO (#935 ) * ipo-dpo trainer * fix missing abstract method * chatml template, grad checkpointing kwargs support * fix steps calc for RL and add dataloader kwargs * wip to fix dpo and start ppo * more fixes * refactor to generalize map fn * fix dataset loop and handle argilla pref dataset * set training args * load reference model on seperate gpu if more than one device * no auto upload to hub for dpo, don't add lora adapters to ref model for dpo * fixes for rl training * support for ipo from yaml * set dpo training args from the config, add tests * chore: lint * set sequence_len for model in test * add RLHF docs	2024-01-04 18:22:55 -05:00