* native support for modal cloud from CLI
* do lm_eval in cloud too
* Fix the sub call to lm-eval
* lm_eval option to not post eval, and append not extend
* cache bust when using branch, grab sha of latest image tag, update lm-eval dep
* allow minimal yaml for lm eval
* include modal in requirements
* update link in README to include utm
* pr feedback
* use chat template
* revision support
* apply chat template as arg
* add wandb name support, allow explicit a100-40gb
* cloud is optional
* handle accidental setting of tasks with a single task str
* document the modal cloud yaml for clarity [skip ci]
* cli docs
* support spawn vs remote for lm-eval
* Add support for additional docker commands in modal image build
* cloud config shouldn't be a dir
* Update README.md
Co-authored-by: Charles Frye <cfrye59@gmail.com>
* fix annotation args
---------
Co-authored-by: Charles Frye <cfrye59@gmail.com>
* transformers 4.47.1
* drop monkeypatches
* can't remove patches yet
* make flash attention forward ignore the loss kwargs
* patch the flash attention in the modeling arch too
* remove fsdp and deepspeed patches
* cleanup PR
* bump accelerate and torchao, also logically reorder/group requirements
* meant to include torchao
* use official patch release
* fix build w pyproject to respect insalled torch version
* include in manifest
* disable duplicate code check for now
* move parser so it can be found
* add checks for correct pytorch version so this doesn't slip by again
* prepare plugins needs to happen so registration can occur to build the plugin args
use yaml.dump
include dataset and more assertions
* attempt to manually register plugins rather than use fn
* fix fixture
* remove fixture
* move cli test to patched dir
* fix cce validation
* feat: add cut_cross_entropy
* fix: add to input
* fix: remove from setup.py
* feat: refactor into an integration
* chore: ignore lint
* feat: add test for cce
* fix: set max_steps for liger test
* chore: Update base model following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: update special_tokens following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: remove with_temp_dir following comments
* fix: plugins aren't loaded
* chore: update quotes in error message
* chore: lint
* chore: lint
* feat: enable FA on test
* chore: refactor get_pytorch_version
* fix: lock cce commit version
* fix: remove subclassing UT
* fix: downcast even if not using FA and config check
* feat: add test to check different attentions
* feat: add install to CI
* chore: refactor to use parametrize for attention
* fix: pytest not detecting test
* feat: handle torch lower than 2.4
* fix args/kwargs to match docs
* use release version cut-cross-entropy==24.11.4
* fix quotes
* fix: use named params for clarity for modal builder
* fix: handle install from pip
* fix: test check only top level module install
* fix: re-add import check
* uninstall existing version if no transformers submodule in cce
* more dataset fixtures into the cache
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* see if unsloth installs cleanly in ci
* check unsloth install on regular tests, not sdist
* fix ampere check exception for ci
* use cached_property instead
* add an e2e test for unsloth qlora
* reduce seq len and mbsz to prevent oom in ci
* add checks for fp16 and sdp_attention
* pin unsloth to a specific release
* add unsloth to docker image too
* fix flash attn xentropy patch
* fix loss, add check for loss when using fa_xentropy
* fix special tokens for test
* typo
* test fa xentropy with and without gradient accum
* pr feedback changes
* add a basic notebook for lab users in the root
* update notebook and fix cors for jupyter
* cell is code
* fix eval batch size check
* remove intro notebook
* fix attetion mask with packing
* set position ids and use block diagonal attn mask
* fix expand mask for multiple batch items, make sure we pad position_ids
* don't move masks to cpu
* use multi pack dataloader w random sampler
* add position_ids back
* more fixes for dataloader integration
* est total tokens, fix field loop
* more fixes, position_ids seems broken
* more fixes for sample packing
* use distributed sampler, avoid accelerate prepare
* use accelerator prepare for dataloader
* fix for position_ids w packing
* Update src/axolotl/utils/dataloader.py
* validation for sample packing and doc
* more fixes for 4k and optimizations
* optimized expand mask fn
* better handling of variance in multipack dataloader length and trainer hanging when it runs out of data
* fix rounding of len of batches to int
* better handling so that all devices have the same dataloader len
* fix step calc for packing
* pass sample packing efficiency to training args
* add a test for the mask expansion for sequence packing
* only process eval dataset for packing if not None
* don't split batches when packing
* weighted CE losses
* weighted CEL fixes
* limit packing to sequences of max seq len
* seq_len_multiple for packing
* make sure the chunk size is an int
* sample_packing_seq_len_multiplier config
* use cumulative seq len with var len flash attn v2 w packing
* properly calculate max len
* fix flash-attn, xformers, packing, support chatml
* fix chatml system prompt for openorca, legacy tokenizer opts
* add chatml
* add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test
* fix test and pylint checks
* more packing and dataset optimizations and fixes
* filter w multiple cpus
* more fixes and optimizations
* fixes and go back to distributed sampler since batch sampler won't work
* fix counts by accounting for num devices
* fix steps calculation
* previous accelerate is still most performant
* add numba to requirements.
* use custom distributed checks
* fix sampler to prevent overfit w new epochs
* let's not cleanup the cached datasets
* calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier
* speed optimizations and set accelerate fsdp env vars
* optimize dataset concatenation?
* more optimizations for dataset handling
* fix import for annotation
* manual pre-commit fixes
* another sum optimization and bug fix for calc steps
* fix packing estimations
* fix formatting
* pylint problems
* add back flash attention branch for handling unpacked sequences seperately
* Address PR feedback
* add optional sample packing config params to readme