RL/DPO (#935)
* ipo-dpo trainer * fix missing abstract method * chatml template, grad checkpointing kwargs support * fix steps calc for RL and add dataloader kwargs * wip to fix dpo and start ppo * more fixes * refactor to generalize map fn * fix dataset loop and handle argilla pref dataset * set training args * load reference model on seperate gpu if more than one device * no auto upload to hub for dpo, don't add lora adapters to ref model for dpo * fixes for rl training * support for ipo from yaml * set dpo training args from the config, add tests * chore: lint * set sequence_len for model in test * add RLHF docs
This commit is contained in:
@@ -61,6 +61,12 @@ def train(
|
||||
msg += " and peft_config..."
|
||||
LOG.debug(msg)
|
||||
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
|
||||
model_ref = None
|
||||
if cfg.rl:
|
||||
# load the model again for model_ref/baseline
|
||||
model_ref, _ = load_model(
|
||||
cfg, tokenizer, inference=cli_args.inference, reference_model=True
|
||||
)
|
||||
|
||||
safe_serialization = cfg.save_safetensors is True
|
||||
|
||||
@@ -83,7 +89,7 @@ def train(
|
||||
freeze_parameters_except(model, cfg.unfrozen_parameters)
|
||||
|
||||
trainer = setup_trainer(
|
||||
cfg, train_dataset, eval_dataset, model, tokenizer, total_num_steps
|
||||
cfg, train_dataset, eval_dataset, (model, model_ref), tokenizer, total_num_steps
|
||||
)
|
||||
|
||||
if hasattr(model, "config"):
|
||||
|
||||
Reference in New Issue
Block a user