axolotl/examples/llama-3/instruct-dpo-lora-8b.yml at e8d3da00814ec7773d33edd5643bb885d85686cb

Files

Keith Stevens 985819d89b Add a chat_template prompt strategy for DPO (#1725 )

* Implementing a basic chat_template strategy for DPO datasets

This mimics the sft chat_template strategy such that users can:
* Specify the messages field
* Specify the per message role and content fields
* speicfy the chosen and rejected fields
* Let the tokenizer construct the raw prompt
* Ensure the chosen and rejected fields don't have any prefix tokens

* Adding additional dpo chat template unittests

* Rename test class

2024-07-21 09:10:42 -04:00

1.4 KiB

Raw Blame History

View Raw

1.4 KiB Raw Blame History

1.4 KiB

Raw Blame History