* ipo-dpo trainer

* fix missing abstract method

* chatml template, grad checkpointing kwargs support

* fix steps calc for RL and add dataloader kwargs

* wip to fix dpo and start ppo

* more fixes

* refactor to generalize map fn

* fix dataset loop and handle argilla pref dataset

* set training args

* load reference model on seperate gpu if more than one device

* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo

* fixes for rl training

* support for ipo from yaml

* set dpo training args from the config, add tests

* chore: lint

* set sequence_len for model in test

* add RLHF docs
This commit is contained in:
Wing Lian
2024-01-04 18:21:25 -05:00
parent 59b2d302c8
commit f243c2186d
11 changed files with 388 additions and 6 deletions

View File

@@ -61,6 +61,12 @@ def train(
msg += " and peft_config..."
LOG.debug(msg)
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
model_ref = None
if cfg.rl:
# load the model again for model_ref/baseline
model_ref, _ = load_model(
cfg, tokenizer, inference=cli_args.inference, reference_model=True
)
safe_serialization = cfg.save_safetensors is True
@@ -83,7 +89,7 @@ def train(
freeze_parameters_except(model, cfg.unfrozen_parameters)
trainer = setup_trainer(
cfg, train_dataset, eval_dataset, model, tokenizer, total_num_steps
cfg, train_dataset, eval_dataset, (model, model_ref), tokenizer, total_num_steps
)
if hasattr(model, "config"):