* Make sure test_dataset are used and treat val_set_size.
* Add test_datasets docs.
* Apply suggestions from code review
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* loftq support for lora
* fix loftq check
* update readme for loftq
* readability cleanup
* use peft main for loftq fixes, remove unnecessary special tokens
* remove unused test from older deprecation
* add system message to template
* readme update
* added code to register new system message
* register chatml template for test
---------
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Mistral-7b finetune example using axolotl with code,config,data
* Corrected the path for huggingface dataset
* Update data.jsonl
* chore: lint
---------
Co-authored-by: twenty8th <twenty8th@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* set fp16 to false if bf16, update bf16: auto in example YAMLs
* unset fp16 so that it fallsback properly if bf16 isn't available
* Update README.md [skip-ci]
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* test that bf16 disables fp16
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* Add s2_attn to hijack flash code
* Refactor code to account for s2_attn
* Add test for models utils
* Add ``s2_attention`` option to llama configs
* Add ``s2_attention`` option to README config
* Format code to appease linter
* chore: lint
* Remove xpos and llama-landmark [bad merge]
* add e2e smoke tests for shifted sparse attention
* remove stray patch from merge
* update yml with link to paper for s2_attention/longlora
* fix assertion check for full fine tune
* increase sequence len for tests and PR feedback updates
* reduce context len to 16k for tests
* reduce context len to 16k for tests
* reduce batch size for larger context len and udpate test to check message
* fix test for message
---------
Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Cosine min lr
* Cosine min lr - warn if using deepspeed
* cosine_min_lr_ratio readme
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: improved memory handling when model is bigger than existing VRAM
* feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM)
For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only.
* doc: add README
* fix: enable progress bars in do_merge_lora()
* doc: mention gpu_memory_limit and lora_on_cpu in merge part of README
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: remove deletion of removed model_kwargs key
* fix: validate that gpu_memory_limit and max_memory are not both set
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: switch to using the HuggingFace Transformers NEFT implementation
* linter
* add support for noisy_embedding_alpha with a warning about it being renamed
* restore pre/posttrain_hooks
* move validation of NEFT noise alpha into validate_config()
* linter
* mixtral multipack
* use mixtral model
* sample yml
* calculate cu_seqlens properly
* use updated flash ettention setting
* attn var checks
* force use of flash attention 2 for packing
* lint
* disable future fix for now
* update support table
* Support device_map sequential (and others). Support max_memory in cfg.
* Update documentation in README accordingly.
* Update README.md
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Feat: Update to handle wandb env better
* chore: rename wandb_run_id to wandb_name
* feat: add new recommendation and update config
* fix: indent and pop disabled env if project passed
* feat: test env set for wandb and recommendation
* feat: update to use wandb_name and allow id
* chore: add info to readme
* isolate torch from the requirements.txt
* fix typo for removed line ending
* pin transformers and accelerate to latest releases
* try w auto-gptq==0.5.1
* update README to remove manual peft install
* pin xformers to 0.0.22
* bump flash-attn to 2.3.3
* pin flash attn to exact version
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config