* FIX: TRL trainer preprocessing step was running in one process
* FIX: max_length and max_prompt_length was not being sent to ORPOTrainer
* FIX: Change ORPO max prompt length to 1/4 of max length, otherwise we get strange behaviour
* FIX: Removed change from a different PR
* FIX: Black fix
* explicitly set max prompt len for orpo config
---------
Co-authored-by: Ali Mosavian <ali.mosavian@kry.se>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* add dpo llama3
* fix dpo bos and eos
* bos token gets added automatically by the tokenizer
* explicit <|end_of_text|> not needed, as eot_id is sufficient
---------
Co-authored-by: Nero10578 <owenarliawan@gmail.com>
* adding llama3 fastchat conversation monkeypatch
* Updated conversation turns to work with PR3259 of FastChat
* fixed bos token
* bump fastchat version
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Gradio Configuration Settings
* Making various Gradio variables configurable instead of hardcoded
* Remove overwriting behavour of 'default tokens' that breaks tokenizer for llama3
* Fix type of gradio_temperature
* revert un-necessary change and lint
---------
Co-authored-by: Marijn Stollenga <stollenga@imfusion.de>
Co-authored-by: Marijn Stollenga <stollenga@imfusion.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Pass weakref to model in the SIGINT handler to free up model post train()
* Fix lint issues
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* FIX: TRL trainer preprocessing step was running in one process
* FIX: Changed so that dataset_num_proc is sent to CPO, KTO and ORPO trainer args and directly to the trainer when DPO
* FIX: Changed back to only support ORPO for now, since KTO is handled in another way
---------
Co-authored-by: Ali Mosavian <ali.mosavian@kry.se>
* PoSE wip
* fixes for pose splitting
* set pose context len so we can pick that up seperately from the usable training context len
* support min sample len and define num chunks
* fix chunk splitting
* support for curriculum/ordered learning with pose
* fix sequence len sort
* add curriculum_sampling to pydantic
* add example for mistral orpo
* sample_packing: false for orpo
* go to load_dataset (since load_rl_datasets require a transfom_fn, which only dpo uses currently)
* Add support for Gemma chat template
* Update fschat version to include its newest support for Gemma chat style
* pin fastchat to current HEAD
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* wrap prepared_ds_path in str() to avoid TypeError in fsspec package
`fsspec` calls `if "::" in path` on `prepared_ds_path`, which will throw an error if it is a `PosixPath` object.
* update test too
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* WIP use trl ORPOTrainer
* fixes to make orpo work with trl
* fix the chat template laoding
* make sure to handle the special tokens and add_generation for assistant turn too
* wip for dbrx finetuning
* add fastcore for parallel loading of sharded weights
* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback
* update to use v2 of the converted model
* more fixes for dbrx loras
* make sure to enable fsdp activation checkpointing
* fix support for 8bit loras too for dbrx
* apply z3 leaf moe fix for DBRX with deepspeed
* don't raise value error since child module searches could fail and be ok
* revert a previous change to fix fsdp
* update mistral/mistral qlora+fsdp yamls
* fix qlora+fsdp quant storage type
* more edge cases for qlora-fsdp
* fixes for fsdp+qlora w optimizer in 8bit
* add bigstral z3 config and make sure to use full_state_dict for fsdp