* add missing evals_per_epoch setting
* more pydantic fixes
* more fixes
* move test from normalization to validation
* increase eval size for sample packing tests
* make mlflow optional
* fix xformers
don't patch swiglu if xformers not working
fix the check for xformers swiglu
* fix install of xformers with extra index url for docker builds
* fix docker build arg quoting
* support for true batches with multipack
* patch the map dataset fetcher to handle batches with packed indexes
* patch 4d mask creation for sdp attention
* better handling for BetterTransformer
* patch general case for 4d mask
* setup forward patch. WIP
* fix patch file
* support for multipack w/o flash attention for llama
* cleanup
* add warning about bf16 vs fp16 for multipack with sdpa
* bugfixes
* add 4d multipack tests, refactor patches
* update tests and add warnings
* fix e2e file check
* skip sdpa test if not at least torch 2.1.1, update docs
* phi2 multipack
* update validation and examples for phi
* more updates to phi examples
* make sure to use the correct collator for phi multipack
* phi needs attention mask now for multipack
* if the special token already exists in the tokenizer, don't require in lora modules to save
* fix qlora yml for phi, fix phi test validation
* test qlora too
* make sure flash attention is enabled for the test
* don't use remote code for phi anymore
* reduce sequence len for sample packing phi
* also fix multipack for falcon and add smoke tests
* make sure to handle special tokens and added tokens for lora
* fix reference to model_type
* fix tests for falcon
* fix stray typo
* fixes for smoke tests
* Add s2_attn to hijack flash code
* Refactor code to account for s2_attn
* Add test for models utils
* Add ``s2_attention`` option to llama configs
* Add ``s2_attention`` option to README config
* Format code to appease linter
* chore: lint
* Remove xpos and llama-landmark [bad merge]
* add e2e smoke tests for shifted sparse attention
* remove stray patch from merge
* update yml with link to paper for s2_attention/longlora
* fix assertion check for full fine tune
* increase sequence len for tests and PR feedback updates
* reduce context len to 16k for tests
* reduce context len to 16k for tests
* reduce batch size for larger context len and udpate test to check message
* fix test for message
---------
Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* attempt to also run e2e tests that needs gpus
* fix stray quote
* checkout specific github ref
* dockerfile for tests with proper checkout
ensure wandb is dissabled for docker pytests
clear wandb env after testing
clear wandb env after testing
make sure to provide a default val for pop
tryin skipping wandb validation tests
explicitly disable wandb in the e2e tests
explicitly report_to None to see if that fixes the docker e2e tests
split gpu from non-gpu unit tests
skip bf16 check in test for now
build docker w/o cache since it uses branch name ref
revert some changes now that caching is fixed
skip bf16 check if on gpu w support
* pytest skip for auto-gptq requirements
* skip mamba tests for now, split multipack and non packed lora llama tests
* split tests that use monkeypatches
* fix relative import for prev commit
* move other tests using monkeypatches to the correct run