DPO cleanup (#1126)

* cleanup dpo to be a little more extensible, add zephyr/nectar strategy * fix eos slash * support for eval split * fix kwargs * handle empty evals * don't load peft model for dpo * ensure dpo traning args gets bf16 for peft if applicable * fix duplicate kwargs for bf16 * make sure to respect the configured lr scheduler * supprt trainer callback to push config to wandb * set dataloader preload args * ensure that we are loading the lora when merging * Update src/axolotl/utils/data.py Co-authored-by: Agus <agustin.piqueres@gmail.com> * support local datasets for dpo Co-authored-by: Agus <agustin.piqueres@gmail.com> * chore: lint * dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names * add split to dpo tests * fix rebase/merging error * handle edge case w logging * use accelerator for dpo datasets so it doesn't break the logger * missing args * validate checkpoint is an adapter for now * log warning when dataset strategy is not loadable --------- Co-authored-by: Agus <agustin.piqueres@gmail.com>
2024-01-23 00:40:37 -05:00
parent 5439707489
commit 7523d1f557
10 changed files with 440 additions and 106 deletions
--- a/src/axolotl/train.py
+++ b/src/axolotl/train.py
@@ -96,7 +96,12 @@ def train(
        freeze_parameters_except(model, cfg.unfrozen_parameters)

    trainer = setup_trainer(
-        cfg, train_dataset, eval_dataset, (model, model_ref), tokenizer, total_num_steps
+        cfg,
+        train_dataset,
+        eval_dataset,
+        (model, model_ref, peft_config),
+        tokenizer,
+        total_num_steps,
    )

    if hasattr(model, "config"):