* Add s2_attn to hijack flash code
* Refactor code to account for s2_attn
* Add test for models utils
* Add ``s2_attention`` option to llama configs
* Add ``s2_attention`` option to README config
* Format code to appease linter
* chore: lint
* Remove xpos and llama-landmark [bad merge]
* add e2e smoke tests for shifted sparse attention
* remove stray patch from merge
* update yml with link to paper for s2_attention/longlora
* fix assertion check for full fine tune
* increase sequence len for tests and PR feedback updates
* reduce context len to 16k for tests
* reduce context len to 16k for tests
* reduce batch size for larger context len and udpate test to check message
* fix test for message
---------
Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* keep gate in fp32 for loras
* add e2e check for lora w/o flash attention for mixtral to check gate
* add checks for gate in fp32 for mixtral, add typehints to train outputs
* mixtral doesn't support basic lora 🤦
add lora tests @ 16bit and fix gate layer check
fix the parameter name, was using the old disco name
don't lora over the gate so we can check that is in fp32
fix dtype check
* ensure we're using fp16/bf16 for 16bit and qlora is always going to be in uint8
* additional logging to get maximum token length of a sequence in the dataset
* fix ordering to properly determine the max_len of tokens before dropping anything longer
* fix: `train_on_inputs: true` ignored for sharegpt
* enable unit test for train_on_inputs for sharegpt
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* attempt to also run e2e tests that needs gpus
* fix stray quote
* checkout specific github ref
* dockerfile for tests with proper checkout
ensure wandb is dissabled for docker pytests
clear wandb env after testing
clear wandb env after testing
make sure to provide a default val for pop
tryin skipping wandb validation tests
explicitly disable wandb in the e2e tests
explicitly report_to None to see if that fixes the docker e2e tests
split gpu from non-gpu unit tests
skip bf16 check in test for now
build docker w/o cache since it uses branch name ref
revert some changes now that caching is fixed
skip bf16 check if on gpu w support
* pytest skip for auto-gptq requirements
* skip mamba tests for now, split multipack and non packed lora llama tests
* split tests that use monkeypatches
* fix relative import for prev commit
* move other tests using monkeypatches to the correct run
* fix double eos token for chatml
* isolate fix to chatml conversation
* fix add special tokens to include rstrip
* add test for train_on_inputs for sharegpt
* don't use rstrip for chatml
* Cosine min lr
* Cosine min lr - warn if using deepspeed
* cosine_min_lr_ratio readme
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* restore to current phi modeling code from phi-2
* enable gradient checkpointing
* don't cast everything to float32 all the time
* gradient checkpointing for phi2 ParallelBlock module too
* fix enabling flash attn for phi2
* add comment about import
* fix phi2 example
* fix model type check for tokenizer
* revert float32 -> bf16 casting changes
* support fused dense flash attn
* fix the repo for flash-attn
* add package name for subdir pkg
* fix the data collator when not using sample packing
* install packaging for pytests in ci
* also fix setup to not install flash attn fused dense subdir if not extras
* split out the fused-dense-lib in extra requires
* don't train w group_by_length for phi
* update integration test to use phi2
* set max steps and save steps for phi e2e tests
* try to workaround ssave issue in ci
* skip phi2 e2e test for now