* feat: support new arg num_items_in_batch
* use kwargs to manage extra unknown kwargs for now
* upgrade against upstream transformers main
* make sure trl is on latest too
* fix for upgraded trl
* fix: handle trl and transformer signature change
* feat: update trl to handle transformer signature
* RewardDataCollatorWithPadding no longer has max_length
* handle updated signature for tokenizer vs processor class
* invert logic for tokenizer vs processor class
* processing_class, not processor class
* also handle processing class in dpo
* handle model name w model card creation
* upgrade transformers and add a loss check test
* fix install of tbparse requirements
* make sure to add tbparse to req
* feat: revert kwarg to positional kwarg to be explicit
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* swaps to use newer sample packing for mistral
* fix multipack patch test
* patch the common fa utils
* update for refactor of flash attn unpad
* remove un-needed drop attn mask for mistral
* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2
* update test
* bump flash attention 2.5.8 -> 2.6.1
* use triton implementation of cross entropy from flash attn
* add smoke test for flash attn cross entropy patch
* fix args to xentropy.apply
* handle tuple from triton loss fn
* ensure the patch tests run independently
* use the wrapper already built into flash attn for cross entropy
* mark pytest as forked for patches
* use pytest xdist instead of forked, since cuda doesn't like forking
* limit to 1 process and use dist loadfile for pytest
* change up pytest for fixture to reload transformers w monkeypathc
* add missing evals_per_epoch setting
* more pydantic fixes
* more fixes
* move test from normalization to validation
* increase eval size for sample packing tests
* make mlflow optional
* fix xformers
don't patch swiglu if xformers not working
fix the check for xformers swiglu
* fix install of xformers with extra index url for docker builds
* fix docker build arg quoting
* support for true batches with multipack
* patch the map dataset fetcher to handle batches with packed indexes
* patch 4d mask creation for sdp attention
* better handling for BetterTransformer
* patch general case for 4d mask
* setup forward patch. WIP
* fix patch file
* support for multipack w/o flash attention for llama
* cleanup
* add warning about bf16 vs fp16 for multipack with sdpa
* bugfixes
* add 4d multipack tests, refactor patches
* update tests and add warnings
* fix e2e file check
* skip sdpa test if not at least torch 2.1.1, update docs
* phi2 multipack
* update validation and examples for phi
* more updates to phi examples
* make sure to use the correct collator for phi multipack
* phi needs attention mask now for multipack
* if the special token already exists in the tokenizer, don't require in lora modules to save
* fix qlora yml for phi, fix phi test validation
* test qlora too
* make sure flash attention is enabled for the test
* don't use remote code for phi anymore
* reduce sequence len for sample packing phi
* also fix multipack for falcon and add smoke tests
* make sure to handle special tokens and added tokens for lora
* fix reference to model_type
* fix tests for falcon
* fix stray typo
* fixes for smoke tests
* Add s2_attn to hijack flash code
* Refactor code to account for s2_attn
* Add test for models utils
* Add ``s2_attention`` option to llama configs
* Add ``s2_attention`` option to README config
* Format code to appease linter
* chore: lint
* Remove xpos and llama-landmark [bad merge]
* add e2e smoke tests for shifted sparse attention
* remove stray patch from merge
* update yml with link to paper for s2_attention/longlora
* fix assertion check for full fine tune
* increase sequence len for tests and PR feedback updates
* reduce context len to 16k for tests
* reduce context len to 16k for tests
* reduce batch size for larger context len and udpate test to check message
* fix test for message
---------
Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* attempt to also run e2e tests that needs gpus
* fix stray quote
* checkout specific github ref
* dockerfile for tests with proper checkout
ensure wandb is dissabled for docker pytests
clear wandb env after testing
clear wandb env after testing
make sure to provide a default val for pop
tryin skipping wandb validation tests
explicitly disable wandb in the e2e tests
explicitly report_to None to see if that fixes the docker e2e tests
split gpu from non-gpu unit tests
skip bf16 check in test for now
build docker w/o cache since it uses branch name ref
revert some changes now that caching is fixed
skip bf16 check if on gpu w support
* pytest skip for auto-gptq requirements
* skip mamba tests for now, split multipack and non packed lora llama tests
* split tests that use monkeypatches
* fix relative import for prev commit
* move other tests using monkeypatches to the correct run