* fix 405b with lower cpu ram requirements
* make sure to use doouble quant and only skip output embeddings
* set model attributes
* more fixes for sharded fsdp loading
* update the base model in example to use pre-quantized nf4-bf16 weights
* upstream fixes for qlora+fsdp
* various batch of fixes
* more tweaks
* fix autoawq requirement for torch flexibility
* simplify conditionals
* multi-node fixes wip
* bump transformers and include 405b qlora+fsdp yaml
* Implementing a basic chat_template strategy for DPO datasets
This mimics the sft chat_template strategy such that users can:
* Specify the messages field
* Specify the per message role and content fields
* speicfy the chosen and rejected fields
* Let the tokenizer construct the raw prompt
* Ensure the chosen and rejected fields don't have any prefix tokens
* Adding additional dpo chat template unittests
* Rename test class
* bump transformers and set roundup_power2_divisions for more VRAM improvements
* support for low bit optimizers from torch ao
* fix check for alternate optimizers and use nous models on hf for llama3
* add missing check for ao_adamw_fp8
* fix check when using custom optimizers w adamw
* Fix eval_sample_packing in llama-3 lora example
* Update examples/llama-3/lora-8b.yml
Co-authored-by: Wing Lian <wing.lian@gmail.com>
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* phi-3 support and perplexity metric
* phi-3 chat template
* metrics updates
* chore: lint
* fix assertion on Tensor
* fix tests since tokenization happens in the metric
* fix perplexity value of shorter passage
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
The current yml code throws an error: ValueError: Please set lora_modules_to_save to [`embed_tokens`, `lm_head`] when using an adapter and changing the special tokens.
I added the required changes to resolve it
The strategy now supports configuring several fields: * The data field holding message arrays * the role and
content fields for each message * role mapping from source to target types
additionally this adds a sample llama3-8b instruct template using the chat template
* include mlflow installation in the colab notebook
Without explicitly installing mlflow the `accelerate launch` command fails.
* update the colab noteboko to use the latest tinyllama config
* add example for mistral orpo
* sample_packing: false for orpo
* go to load_dataset (since load_rl_datasets require a transfom_fn, which only dpo uses currently)
* wip for dbrx finetuning
* add fastcore for parallel loading of sharded weights
* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback
* update to use v2 of the converted model
* more fixes for dbrx loras
* make sure to enable fsdp activation checkpointing
* fix support for 8bit loras too for dbrx
* apply z3 leaf moe fix for DBRX with deepspeed
* don't raise value error since child module searches could fail and be ok
* revert a previous change to fix fsdp
* update mistral/mistral qlora+fsdp yamls
* fix qlora+fsdp quant storage type
* more edge cases for qlora-fsdp
* fixes for fsdp+qlora w optimizer in 8bit
* add bigstral z3 config and make sure to use full_state_dict for fsdp
* add lisa support
* fix default and fix attribute traversal for layers
* improve lisa callback logging
* fix LISA by ensuring params are not frozen during __init__
* example config for lisa
---------
Co-authored-by: Aman Karmani <aman@tmm1.net>
* wip qlora + fsdp fixes
* more fixes
* make sure to load the lora 🤦
* only setup quantized meta on non-zero rank:
* only run setup_quantized_peft_meta_for_training for qlora+fsdp
* more fixes for qlora+fsdp
* chore: lint
* add example yml
* support mistral too
* fix for model_type and add mixtral support too
* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp
* refactor for duplicate code
* Add CausalLMBenchEvalCallback for measuring seq2seq performance
* Fix code for pre-commit
* Fix typing and improve logging
* eval_sample_packing must be false with CausalLMBenchEvalCallback
* add mps support
* linter stuff
* CI fixes
* install packaging for various tests
* Update setup.py
* Revert "install packaging for various tests"
This reverts commit 980e7aa44d.
* Revert "CI fixes"
This reverts commit 4609e3b166.
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* wip for pretraining/iterable data with arbitrary prompt strategies
* more fixes, wip
* more fixes for custom pretraining
* iterable ds wrapper not needed
* remove extra features
* chore: lint
* update pretraning example yml
* fix order for partials
* fixup for tests