ORPO Trainer replacement (#1551)
* WIP use trl ORPOTrainer * fixes to make orpo work with trl * fix the chat template laoding * make sure to handle the special tokens and add_generation for assistant turn too
This commit is contained in:
@@ -39,6 +39,6 @@ s3fs
|
||||
gcsfs
|
||||
# adlfs
|
||||
|
||||
trl @ git+https://github.com/huggingface/trl.git@0ee349dcd43b0f4b3169449f16751c38ac4a609f
|
||||
trl==0.8.5
|
||||
zstandard==0.22.0
|
||||
fastcore
|
||||
|
||||
Reference in New Issue
Block a user