* add more test cases for gradient accumulation and fix zero3
* swap out for smaller model
* fix missing return
* fix missing pad_token in config
* support concurrency for multigpu testing
* cast empty deepspeed to empty string for zero3 check
* fix temp_dir as fixture so parametrize works properly
* fix test file for multigpu evals
* don't use default
* don't use default for fsdp_state_dict_type
* don't use llama tokenizer w smollm
* also automatically cancel multigpu for concurrency
* update actions version for node16 deprecation
* update pre-commit/action to use 3.0.1 for actions/cache@v4 dep
* update docker/setup-buildx-action too to v3
* add axolotlai docker hub org to publish list
* fix to use latest actions docker metadata version
* fix list in yaml for expected format for action
* missed a change
* feat: support new arg num_items_in_batch
* use kwargs to manage extra unknown kwargs for now
* upgrade against upstream transformers main
* make sure trl is on latest too
* fix for upgraded trl
* fix: handle trl and transformer signature change
* feat: update trl to handle transformer signature
* RewardDataCollatorWithPadding no longer has max_length
* handle updated signature for tokenizer vs processor class
* invert logic for tokenizer vs processor class
* processing_class, not processor class
* also handle processing class in dpo
* handle model name w model card creation
* upgrade transformers and add a loss check test
* fix install of tbparse requirements
* make sure to add tbparse to req
* feat: revert kwarg to positional kwarg to be explicit
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
use a constraint file
use min version of xformers
don't install autoawq with pytorch 2.5.0
debugging for errors
upgrade pip first
fix action yml
add back try/except
retry w/o constraint
use --no-build-isolation
show torch version
install setuptools and wheel
add back try/except
* Attempt to run multigpu in PR CI for now to ensure it works
* fix yaml file
* forgot to include multigpu tests
* fix call to cicd.multigpu
* dump dictdefault to dict for yaml conversion
* use to_dict instead of casting
* 16bit-lora w flash attention, 8bit lora seems problematic
* add llama fsdp test
* more tests
* Add test for qlora + fsdp with prequant
* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test
* move multigpu tests to biweekly