* re-enable DPO for tests in modal ci
* workaround for training args
* don't mixin AxolotlTrainingArguments
* fix mixin order so MRO doesn't result in
TypeError: non-default argument follows default argument error
* use smaller datasets for dpo tests
The current yml code throws an error: ValueError: Please set lora_modules_to_save to [`embed_tokens`, `lm_head`] when using an adapter and changing the special tokens.
I added the required changes to resolve it
The strategy now supports configuring several fields: * The data field holding message arrays * the role and
content fields for each message * role mapping from source to target types
additionally this adds a sample llama3-8b instruct template using the chat template
* include mlflow installation in the colab notebook
Without explicitly installing mlflow the `accelerate launch` command fails.
* update the colab noteboko to use the latest tinyllama config
* Switch to parallel FFD bin packing algorithm.
Add support for packing in a distributed context.
Add packing efficiency estimate back.
* revert changes to distributed code
* chore: lint
* fix config w new params for packing test
* add sample_packing_group_size and sample_packing_bin_size to cfg schema
* fix lamdbda function
* fix sampler/dataloader calculations for packing
---------
Co-authored-by: dsesclei <dave@sescleifer.com>
* Fix llama3 chat_template (the {{eos_token}} leads to an extra <|eot_id|> being added in the last turn). Output now matches official Llama 3 Instruct model
* add tests
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* add kto support
* test cleanup
* fix outdated comment
* fix llama3 ultra
* chore: lint
* update to use rl_beta instead of dpo_beta
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* WIP for unsloth integrations
* import the unsloth code in the right context
* add unsloth mlp, qkv, o lora optimizations
* apply unsloth mlp and qkv kernels
* FIX: TRL trainer preprocessing step was running in one process
* FIX: max_length and max_prompt_length was not being sent to ORPOTrainer
* FIX: Change ORPO max prompt length to 1/4 of max length, otherwise we get strange behaviour
* FIX: Removed change from a different PR
* FIX: Black fix
* explicitly set max prompt len for orpo config
---------
Co-authored-by: Ali Mosavian <ali.mosavian@kry.se>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* add dpo llama3
* fix dpo bos and eos
* bos token gets added automatically by the tokenizer
* explicit <|end_of_text|> not needed, as eot_id is sufficient
---------
Co-authored-by: Nero10578 <owenarliawan@gmail.com>
* adding llama3 fastchat conversation monkeypatch
* Updated conversation turns to work with PR3259 of FastChat
* fixed bos token
* bump fastchat version
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>