* Add support for Dion optimizer
* dion training kwargs
* fix var names
* no dion 8bit for now
* use updated axolotl-contribs-mit for dion optimizer
* add smoke test for dion optimizer
* add docs
* fix typo during edits
* fix test to not remove load in 8bit
* fix: deepcopy lr in RexLR scheduler.
This fixes a problem where when the lr is a scalar tensor, the base_lrs in the get_lr function end up being references to the current learning rate, rather than the correct initial learning rate.
See also related pytorch PR https://github.com/pytorch/pytorch/pull/127190/
* fix: add missing torch.Tensor import
* jagged lr restart scheudler
var name fix
make sure to create scheduler first
* wire things together
* more fixes
* fix for nesting scheduler and first anneal phase
* no need for relora trainer anymore since we've generalized the relora scheduler
* remove redundant relora scheduler and lint
* update relora e2e test for updated params
* need restart steps for relora test
* update quarto docs for dropped relora trainer
* update example yaml
* drop verbose arg
* min lr scale support for jagged lr
* don't let min_lr be nonetype
* cleanup args
* make TiledMLP work with FSDP
* cleanup/gc at start of train to prevent large VRAM spike
* chore: lint
* generic function for non-deepspeed training
* unify patch to fix imports
* update readme for ALST and add examples
* make deepspeed attribute on params check more robust
* update with new info from PR review
* we don't need to call check_dataset_labels when skip_prepare_dataset is set
* Fix actual bug and revert prior fix
* warn and early return instead of raising an error
* use error
* feat: add gemma3n cce
* feat: add sample config
* feat: add gemma3n multimodal mode
* feat: add audio example
* feat: support audio and return pixel values in collator
* feat: support unmask only assistant region (gemma3n for now)
* feat(doc): add notes for audio loading
* feat: add audio support for gemma3n
* feat: update examples
* feat: add gemma3n to the docs
* fix: add link at top
* feat(doc): clarify additional requirements
* fix: mllama missing aspect ratio
* fix: mllama need attention fixes for fa2
* Partially Revert "fix: mllama need attention fixes for fa2"
This reverts commit a0bfdd1777.
* fix: disable FA2 for mllama in vision mode
* feat: update configs to use proper attention
* fix: support other vision features
* feat(doc): clarify requirements for gemma3n
* make pad_to_sequence_len default to the same value as sample_packing
* remove duplicate validation
* fix test
* update description meta
Co-authored-by: NanoCode012 <nano@axolotl.ai>
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* limit num_proc when saving datasets to disk
* enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save
* update fixtures with dataset processes since that should never be NoneType
* improve reusability for tests
* make the initial call to tokenizer.pad not spam the console
* add guard from feedback
* make another common console output less verbose
* more logging fixes
* Added a feature to save prepared dataset in specified shards, removed limiter on multiprocessing during tokenization, and a bug fix of qwen tokenizer
* removed limiters and fixed config variable name
* black lint
* chore: lint
* feat: update handling of dataset_processes
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* Apply generic fused liger ce for unknown models
* fix deepseek liger modeling
* generic cce and config tiled mlp to use original mlp and auto detect compute params
* fix weight and lint
* update warnings
* address PR feedback
* use lookup for model class prefixes
* revert inadvertent change to flash attn verison
* remove un-needed pylint annotations
* fix import