* feat: add cut_cross_entropy
* fix: add to input
* fix: remove from setup.py
* feat: refactor into an integration
* chore: ignore lint
* feat: add test for cce
* fix: set max_steps for liger test
* chore: Update base model following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: update special_tokens following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: remove with_temp_dir following comments
* fix: plugins aren't loaded
* chore: update quotes in error message
* chore: lint
* chore: lint
* feat: enable FA on test
* chore: refactor get_pytorch_version
* fix: lock cce commit version
* fix: remove subclassing UT
* fix: downcast even if not using FA and config check
* feat: add test to check different attentions
* feat: add install to CI
* chore: refactor to use parametrize for attention
* fix: pytest not detecting test
* feat: handle torch lower than 2.4
* fix args/kwargs to match docs
* use release version cut-cross-entropy==24.11.4
* fix quotes
* fix: use named params for clarity for modal builder
* fix: handle install from pip
* fix: test check only top level module install
* fix: re-add import check
* uninstall existing version if no transformers submodule in cce
* more dataset fixtures into the cache
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* see if unsloth installs cleanly in ci
* check unsloth install on regular tests, not sdist
* fix ampere check exception for ci
* use cached_property instead
* add an e2e test for unsloth qlora
* reduce seq len and mbsz to prevent oom in ci
* add checks for fp16 and sdp_attention
* pin unsloth to a specific release
* add unsloth to docker image too
* fix flash attn xentropy patch
* fix loss, add check for loss when using fa_xentropy
* fix special tokens for test
* typo
* test fa xentropy with and without gradient accum
* pr feedback changes
* add more test cases for gradient accumulation and fix zero3
* swap out for smaller model
* fix missing return
* fix missing pad_token in config
* support concurrency for multigpu testing
* cast empty deepspeed to empty string for zero3 check
* fix temp_dir as fixture so parametrize works properly
* fix test file for multigpu evals
* don't use default
* don't use default for fsdp_state_dict_type
* don't use llama tokenizer w smollm
* also automatically cancel multigpu for concurrency
* update actions version for node16 deprecation
* update pre-commit/action to use 3.0.1 for actions/cache@v4 dep
* update docker/setup-buildx-action too to v3
* add axolotlai docker hub org to publish list
* fix to use latest actions docker metadata version
* fix list in yaml for expected format for action
* missed a change
* feat: support new arg num_items_in_batch
* use kwargs to manage extra unknown kwargs for now
* upgrade against upstream transformers main
* make sure trl is on latest too
* fix for upgraded trl
* fix: handle trl and transformer signature change
* feat: update trl to handle transformer signature
* RewardDataCollatorWithPadding no longer has max_length
* handle updated signature for tokenizer vs processor class
* invert logic for tokenizer vs processor class
* processing_class, not processor class
* also handle processing class in dpo
* handle model name w model card creation
* upgrade transformers and add a loss check test
* fix install of tbparse requirements
* make sure to add tbparse to req
* feat: revert kwarg to positional kwarg to be explicit
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
use a constraint file
use min version of xformers
don't install autoawq with pytorch 2.5.0
debugging for errors
upgrade pip first
fix action yml
add back try/except
retry w/o constraint
use --no-build-isolation
show torch version
install setuptools and wheel
add back try/except
* Attempt to run multigpu in PR CI for now to ensure it works
* fix yaml file
* forgot to include multigpu tests
* fix call to cicd.multigpu
* dump dictdefault to dict for yaml conversion
* use to_dict instead of casting
* 16bit-lora w flash attention, 8bit lora seems problematic
* add llama fsdp test
* more tests
* Add test for qlora + fsdp with prequant
* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test
* move multigpu tests to biweekly