* phi2 multipack
* update validation and examples for phi
* more updates to phi examples
* make sure to use the correct collator for phi multipack
* phi needs attention mask now for multipack
* if the special token already exists in the tokenizer, don't require in lora modules to save
* fix qlora yml for phi, fix phi test validation
* test qlora too
* make sure flash attention is enabled for the test
* don't use remote code for phi anymore
* reduce sequence len for sample packing phi
* Mistral-7b finetune example using axolotl with code,config,data
* Corrected the path for huggingface dataset
* Update data.jsonl
* chore: lint
---------
Co-authored-by: twenty8th <twenty8th@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* cleanup dpo to be a little more extensible, add zephyr/nectar strategy
* fix eos slash
* support for eval split
* fix kwargs
* handle empty evals
* don't load peft model for dpo
* ensure dpo traning args gets bf16 for peft if applicable
* fix duplicate kwargs for bf16
* make sure to respect the configured lr scheduler
* supprt trainer callback to push config to wandb
* set dataloader preload args
* ensure that we are loading the lora when merging
* Update src/axolotl/utils/data.py
Co-authored-by: Agus <agustin.piqueres@gmail.com>
* support local datasets for dpo
Co-authored-by: Agus <agustin.piqueres@gmail.com>
* chore: lint
* dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names
* add split to dpo tests
* fix rebase/merging error
* handle edge case w logging
* use accelerator for dpo datasets so it doesn't break the logger
* missing args
* validate checkpoint is an adapter for now
* log warning when dataset strategy is not loadable
---------
Co-authored-by: Agus <agustin.piqueres@gmail.com>
* also fix multipack for falcon and add smoke tests
* make sure to handle special tokens and added tokens for lora
* fix reference to model_type
* fix tests for falcon
* fix stray typo
* fixes for smoke tests
* revert order of filter/drop_long step and handle calc for max_input_len only during preprocessing
* revert some changes to preparing for packing to allow more flexibility
* prepare dataset for packing during pre-processing step
* prepare dataset hash based on sample packing too
* enclose none check
* just cast straight to string for ds hash
* set fp16 to false if bf16, update bf16: auto in example YAMLs
* unset fp16 so that it fallsback properly if bf16 isn't available
* Update README.md [skip-ci]
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* test that bf16 disables fp16
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* add a basic notebook for lab users in the root
* update notebook and fix cors for jupyter
* cell is code
* fix eval batch size check
* remove intro notebook
* qwen2 multipack support
* fix qwen derived model check so it doesn't break qwen2
* fixes to ensure qwen2 packing works
* bump requirements for qwen2
* requirements typo
* Add s2_attn to hijack flash code
* Refactor code to account for s2_attn
* Add test for models utils
* Add ``s2_attention`` option to llama configs
* Add ``s2_attention`` option to README config
* Format code to appease linter
* chore: lint
* Remove xpos and llama-landmark [bad merge]
* add e2e smoke tests for shifted sparse attention
* remove stray patch from merge
* update yml with link to paper for s2_attention/longlora
* fix assertion check for full fine tune
* increase sequence len for tests and PR feedback updates
* reduce context len to 16k for tests
* reduce context len to 16k for tests
* reduce batch size for larger context len and udpate test to check message
* fix test for message
---------
Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* keep gate in fp32 for loras
* add e2e check for lora w/o flash attention for mixtral to check gate
* add checks for gate in fp32 for mixtral, add typehints to train outputs
* mixtral doesn't support basic lora 🤦
add lora tests @ 16bit and fix gate layer check
fix the parameter name, was using the old disco name
don't lora over the gate so we can check that is in fp32
fix dtype check
* ensure we're using fp16/bf16 for 16bit and qlora is always going to be in uint8