* also fix multipack for falcon and add smoke tests
* make sure to handle special tokens and added tokens for lora
* fix reference to model_type
* fix tests for falcon
* fix stray typo
* fixes for smoke tests
* revert order of filter/drop_long step and handle calc for max_input_len only during preprocessing
* revert some changes to preparing for packing to allow more flexibility
* prepare dataset for packing during pre-processing step
* prepare dataset hash based on sample packing too
* enclose none check
* just cast straight to string for ds hash
* set fp16 to false if bf16, update bf16: auto in example YAMLs
* unset fp16 so that it fallsback properly if bf16 isn't available
* Update README.md [skip-ci]
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* test that bf16 disables fp16
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* add a basic notebook for lab users in the root
* update notebook and fix cors for jupyter
* cell is code
* fix eval batch size check
* remove intro notebook
* qwen2 multipack support
* fix qwen derived model check so it doesn't break qwen2
* fixes to ensure qwen2 packing works
* bump requirements for qwen2
* requirements typo
* Add s2_attn to hijack flash code
* Refactor code to account for s2_attn
* Add test for models utils
* Add ``s2_attention`` option to llama configs
* Add ``s2_attention`` option to README config
* Format code to appease linter
* chore: lint
* Remove xpos and llama-landmark [bad merge]
* add e2e smoke tests for shifted sparse attention
* remove stray patch from merge
* update yml with link to paper for s2_attention/longlora
* fix assertion check for full fine tune
* increase sequence len for tests and PR feedback updates
* reduce context len to 16k for tests
* reduce context len to 16k for tests
* reduce batch size for larger context len and udpate test to check message
* fix test for message
---------
Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* keep gate in fp32 for loras
* add e2e check for lora w/o flash attention for mixtral to check gate
* add checks for gate in fp32 for mixtral, add typehints to train outputs
* mixtral doesn't support basic lora 🤦
add lora tests @ 16bit and fix gate layer check
fix the parameter name, was using the old disco name
don't lora over the gate so we can check that is in fp32
fix dtype check
* ensure we're using fp16/bf16 for 16bit and qlora is always going to be in uint8
* additional logging to get maximum token length of a sequence in the dataset
* fix ordering to properly determine the max_len of tokens before dropping anything longer
* fix: `train_on_inputs: true` ignored for sharegpt
* enable unit test for train_on_inputs for sharegpt
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix double eos token for chatml
* isolate fix to chatml conversation
* fix add special tokens to include rstrip
* add test for train_on_inputs for sharegpt
* don't use rstrip for chatml
* Cosine min lr
* Cosine min lr - warn if using deepspeed
* cosine_min_lr_ratio readme
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* restore to current phi modeling code from phi-2
* enable gradient checkpointing
* don't cast everything to float32 all the time
* gradient checkpointing for phi2 ParallelBlock module too
* fix enabling flash attn for phi2
* add comment about import
* fix phi2 example
* fix model type check for tokenizer
* revert float32 -> bf16 casting changes
* support fused dense flash attn
* fix the repo for flash-attn
* add package name for subdir pkg
* fix the data collator when not using sample packing
* install packaging for pytests in ci
* also fix setup to not install flash attn fused dense subdir if not extras
* split out the fused-dense-lib in extra requires
* don't train w group_by_length for phi
* update integration test to use phi2
* set max steps and save steps for phi e2e tests
* try to workaround ssave issue in ci
* skip phi2 e2e test for now
* [Feat] streaming multipack
* WIP make continued pretraining work w multipack
* fix up hadrcoding, lint
* fix dict check
* update test for updated pretraining multipack code
* fix hardcoded data collator fix for multipack pretraining
* fix the collator to be the max length for multipack pretraining
* don't bother with latest tag for test
* cleanup docker build/test
---------
Co-authored-by: jinwonkim93@github.com <jinwonkim>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: improved memory handling when model is bigger than existing VRAM
* feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM)
For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only.
* doc: add README
* fix: enable progress bars in do_merge_lora()
* doc: mention gpu_memory_limit and lora_on_cpu in merge part of README
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: remove deletion of removed model_kwargs key
* fix: validate that gpu_memory_limit and max_memory are not both set
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* ipo-dpo trainer
* fix missing abstract method
* chatml template, grad checkpointing kwargs support
* fix steps calc for RL and add dataloader kwargs
* wip to fix dpo and start ppo
* more fixes
* refactor to generalize map fn
* fix dataset loop and handle argilla pref dataset
* set training args
* load reference model on seperate gpu if more than one device
* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo
* fixes for rl training
* support for ipo from yaml
* set dpo training args from the config, add tests
* chore: lint
* set sequence_len for model in test
* add RLHF docs
* Added chatgml3 conversation type for training models like TinyLLama
* Added chatgml3 conversation type for training models like TinyLLama with lint
* Added chatgml3 conversation type for training models like TinyLLama with lint
* bump transformers and update attention class map name
* also run the tests in docker
* add mixtral e2e smoke test
* fix base name for docker image in test
* mixtral lora doesn't seem to work, at least check qlora
* add testcase for mixtral w sample packing
* check monkeypatch for flash attn multipack
* also run the e2e tests in docker
* use all gpus to run tests in docker ci
* use privileged mode too for docker w gpus
* rename the docker e2e actions for gh ci
* set privileged mode for docker and update mixtral model self attn check
* use fp16/bf16 for mixtral w fa2
* skip e2e tests on docker w gpus for now
* tests to validate mistral and mixtral patches
* fix rel import