* bump transformers and set roundup_power2_divisions for more VRAM improvements
* support for low bit optimizers from torch ao
* fix check for alternate optimizers and use nous models on hf for llama3
* add missing check for ao_adamw_fp8
* fix check when using custom optimizers w adamw
* Fix eval_sample_packing in llama-3 lora example
* Update examples/llama-3/lora-8b.yml
Co-authored-by: Wing Lian <wing.lian@gmail.com>
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* phi-3 support and perplexity metric
* phi-3 chat template
* metrics updates
* chore: lint
* fix assertion on Tensor
* fix tests since tokenization happens in the metric
* fix perplexity value of shorter passage
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
The current yml code throws an error: ValueError: Please set lora_modules_to_save to [`embed_tokens`, `lm_head`] when using an adapter and changing the special tokens.
I added the required changes to resolve it
The strategy now supports configuring several fields: * The data field holding message arrays * the role and
content fields for each message * role mapping from source to target types
additionally this adds a sample llama3-8b instruct template using the chat template
* include mlflow installation in the colab notebook
Without explicitly installing mlflow the `accelerate launch` command fails.
* update the colab noteboko to use the latest tinyllama config
* add example for mistral orpo
* sample_packing: false for orpo
* go to load_dataset (since load_rl_datasets require a transfom_fn, which only dpo uses currently)
* wip for dbrx finetuning
* add fastcore for parallel loading of sharded weights
* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback
* update to use v2 of the converted model
* more fixes for dbrx loras
* make sure to enable fsdp activation checkpointing
* fix support for 8bit loras too for dbrx
* apply z3 leaf moe fix for DBRX with deepspeed
* don't raise value error since child module searches could fail and be ok
* revert a previous change to fix fsdp
* update mistral/mistral qlora+fsdp yamls
* fix qlora+fsdp quant storage type
* more edge cases for qlora-fsdp
* fixes for fsdp+qlora w optimizer in 8bit
* add bigstral z3 config and make sure to use full_state_dict for fsdp
* add lisa support
* fix default and fix attribute traversal for layers
* improve lisa callback logging
* fix LISA by ensuring params are not frozen during __init__
* example config for lisa
---------
Co-authored-by: Aman Karmani <aman@tmm1.net>
* wip qlora + fsdp fixes
* more fixes
* make sure to load the lora 🤦
* only setup quantized meta on non-zero rank:
* only run setup_quantized_peft_meta_for_training for qlora+fsdp
* more fixes for qlora+fsdp
* chore: lint
* add example yml
* support mistral too
* fix for model_type and add mixtral support too
* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp
* refactor for duplicate code
* Add CausalLMBenchEvalCallback for measuring seq2seq performance
* Fix code for pre-commit
* Fix typing and improve logging
* eval_sample_packing must be false with CausalLMBenchEvalCallback
* add mps support
* linter stuff
* CI fixes
* install packaging for various tests
* Update setup.py
* Revert "install packaging for various tests"
This reverts commit 980e7aa44d.
* Revert "CI fixes"
This reverts commit 4609e3b166.
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* wip for pretraining/iterable data with arbitrary prompt strategies
* more fixes, wip
* more fixes for custom pretraining
* iterable ds wrapper not needed
* remove extra features
* chore: lint
* update pretraning example yml
* fix order for partials
* fixup for tests
* loftq support for lora
* fix loftq check
* update readme for loftq
* readability cleanup
* use peft main for loftq fixes, remove unnecessary special tokens
* remove unused test from older deprecation
* phi2 multipack
* update validation and examples for phi
* more updates to phi examples
* make sure to use the correct collator for phi multipack
* phi needs attention mask now for multipack
* if the special token already exists in the tokenizer, don't require in lora modules to save
* fix qlora yml for phi, fix phi test validation
* test qlora too
* make sure flash attention is enabled for the test
* don't use remote code for phi anymore
* reduce sequence len for sample packing phi