* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model
* There is already a condition check within the function. This outer one is not necessary
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* fix 405b with lower cpu ram requirements
* make sure to use doouble quant and only skip output embeddings
* set model attributes
* more fixes for sharded fsdp loading
* update the base model in example to use pre-quantized nf4-bf16 weights
* upstream fixes for qlora+fsdp
* various batch of fixes
* more tweaks
* fix autoawq requirement for torch flexibility
* simplify conditionals
* multi-node fixes wip
* bump transformers and include 405b qlora+fsdp yaml
* swaps to use newer sample packing for mistral
* fix multipack patch test
* patch the common fa utils
* update for refactor of flash attn unpad
* remove un-needed drop attn mask for mistral
* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2
* update test
* bump transformers and set roundup_power2_divisions for more VRAM improvements
* support for low bit optimizers from torch ao
* fix check for alternate optimizers and use nous models on hf for llama3
* add missing check for ao_adamw_fp8
* fix check when using custom optimizers w adamw
* bump flash attention 2.5.8 -> 2.6.1
* use triton implementation of cross entropy from flash attn
* add smoke test for flash attn cross entropy patch
* fix args to xentropy.apply
* handle tuple from triton loss fn
* ensure the patch tests run independently
* use the wrapper already built into flash attn for cross entropy
* mark pytest as forked for patches
* use pytest xdist instead of forked, since cuda doesn't like forking
* limit to 1 process and use dist loadfile for pytest
* change up pytest for fixture to reload transformers w monkeypathc
* Update requirements.txt
Preserve compatibility with torch 2.3.1. [Reference](https://github.com/facebookresearch/xformers/issues/1052)
* fix setup.py to extract the current xformers dep from requirements for replacement
* xformers 0.0.27 wheels not built for torch 2.3.0
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* adding llama3 fastchat conversation monkeypatch
* Updated conversation turns to work with PR3259 of FastChat
* fixed bos token
* bump fastchat version
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Add support for Gemma chat template
* Update fschat version to include its newest support for Gemma chat style
* pin fastchat to current HEAD
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* WIP use trl ORPOTrainer
* fixes to make orpo work with trl
* fix the chat template laoding
* make sure to handle the special tokens and add_generation for assistant turn too
* wip for dbrx finetuning
* add fastcore for parallel loading of sharded weights
* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback
* update to use v2 of the converted model
* more fixes for dbrx loras
* make sure to enable fsdp activation checkpointing
* fix support for 8bit loras too for dbrx
* apply z3 leaf moe fix for DBRX with deepspeed
* don't raise value error since child module searches could fail and be ok
* revert a previous change to fix fsdp
* update mistral/mistral qlora+fsdp yamls
* fix qlora+fsdp quant storage type
* more edge cases for qlora-fsdp
* fixes for fsdp+qlora w optimizer in 8bit
* add bigstral z3 config and make sure to use full_state_dict for fsdp
* support galore once upstreamed into transformers
* update module name for llama in readme and fix typing for all linear
* bump trl for deprecation fixes from newer transformers
* include galore as an extra and install in docker image
* fix optim_args type
* fix optim_args
* update dependencies for galore
* add galore to cicd dockerfile
* wip qlora + fsdp fixes
* more fixes
* make sure to load the lora 🤦
* only setup quantized meta on non-zero rank:
* only run setup_quantized_peft_meta_for_training for qlora+fsdp
* more fixes for qlora+fsdp
* chore: lint
* add example yml
* support mistral too
* fix for model_type and add mixtral support too
* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp
* refactor for duplicate code
* run tests again on Modal
* make sure to run the full suite of tests on modal
* run cicd steps via shell script
* run tests in different runs
* increase timeout
* split tests into steps on modal
* increase workflow timeout
* retry doing this with only a single script
* fix yml launch for modal ci
* reorder tests to run on modal
* skip dpo tests on modal
* run on L4s, A10G takes too long
* increase CPU and RAM for modal test
* run modal tests on A100s
* skip phi test on modal
* env not arg in modal dockerfile
* upgrade pydantic and fastapi for modal tests
* cleanup stray character
* use A10s instead of A100 for modal
* WIP conversion to use pydantic for config validation
* wip, more fields, add capabilities
* wip
* update pydantic validation to match existing tests
* tweak requirements
* setup deprecated paams pydantic model
* more validations
* wrap up rest of the validations
* flesh out the rest of the options from the readme into pydantic
* fix model validators as class methods
remember to return in validator
missing return
add missing relora attributes
fix test for DictDefault change
fix sys template for mistral from fastchat change in PR 2872
fix test for batch size warning
* more missing attributes for cfg
* updates from PR feedback
* fix validation for datasets and pretrain datasets
* fix test for lora check
* make mlflow optional
* fix xformers
don't patch swiglu if xformers not working
fix the check for xformers swiglu
* fix install of xformers with extra index url for docker builds
* fix docker build arg quoting
* Add CausalLMBenchEvalCallback for measuring seq2seq performance
* Fix code for pre-commit
* Fix typing and improve logging
* eval_sample_packing must be false with CausalLMBenchEvalCallback
* import deepspeed integration
* monkeypatch peft adapater with deepspeed for resume from checkpoint
* fix patch
* fix patches attempt 2
* make sure to set lora_model_dir
* skip pylint for deepspeed.utils
* pick up upstream fix in transformers
* remove monkeypatch for deepspeed/peft fix
* no need to set the lora_model_dir on resume
* unset load_in_*bit when using quant config
* guard before del
* better handling of load_in* kwargs
* loftq support for lora
* fix loftq check
* update readme for loftq
* readability cleanup
* use peft main for loftq fixes, remove unnecessary special tokens
* remove unused test from older deprecation
* wip modal for ci
* handle falcon layernorms better
* update
* rebuild the template each time with the pseudo-ARGS
* fix ref
* update tests to use modal
* cleanup ci script
* make sure to install jinja2 also
* kickoff the gh action on gh hosted runners and specify num gpus