* orpo trainer

* rl handling for orpo

* support for remove_unused_columns

* orpo fixes

* fix loader for orpo

* chore: lint

* fix default for remove_unused_columns

* roll ORPO into the main AxolotlTrainer so it can be compatible with some of the other techniques like relora

* better handling of system message for orpo

* revert system prompt changes for chat templtes

* no need for else condition

* split dataset parsing into it's own component
This commit is contained in:
Wing Lian
2024-03-18 13:10:00 -04:00
committed by GitHub
parent e8c8ea64b3
commit 2ea70ebbd8
14 changed files with 451 additions and 24 deletions

View File

@@ -85,7 +85,7 @@ def train(
model.generation_config.do_sample = True
model_ref = None
if cfg.rl:
if cfg.rl and cfg.rl != "orpo":
if cfg.adapter and not cfg.rl_adapter_ref_model:
# use built-in trl autounwrap
LOG.debug("Passing model_ref: None to RL trainer")