DPO cleanup (#1126)

* cleanup dpo to be a little more extensible, add zephyr/nectar strategy

* fix eos slash

* support for eval split

* fix kwargs

* handle empty evals

* don't load peft model for dpo

* ensure dpo traning args gets bf16 for peft if applicable

* fix duplicate kwargs for bf16

* make sure to respect the configured lr scheduler

* supprt trainer callback to push config to wandb

* set dataloader preload args

* ensure that we are loading the lora when merging

* Update src/axolotl/utils/data.py

Co-authored-by: Agus <agustin.piqueres@gmail.com>

* support local datasets for dpo

Co-authored-by: Agus <agustin.piqueres@gmail.com>

* chore: lint

* dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names

* add split to dpo tests

* fix rebase/merging error

* handle edge case w logging

* use accelerator for dpo datasets so it doesn't break the logger

* missing args

* validate checkpoint is an adapter for now

* log warning when dataset strategy is not loadable

---------

Co-authored-by: Agus <agustin.piqueres@gmail.com>
This commit is contained in:
Wing Lian
2024-01-23 00:40:37 -05:00
committed by GitHub
parent 5439707489
commit 7523d1f557
10 changed files with 440 additions and 106 deletions

View File

@@ -96,7 +96,12 @@ def train(
freeze_parameters_except(model, cfg.unfrozen_parameters)
trainer = setup_trainer(
cfg, train_dataset, eval_dataset, (model, model_ref), tokenizer, total_num_steps
cfg,
train_dataset,
eval_dataset,
(model, model_ref, peft_config),
tokenizer,
total_num_steps,
)
if hasattr(model, "config"):