* Revert "fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg…"
This reverts commit e207762928.
* don't revert the values that don't use 'auto'
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* static autocomplete script for axolotl cli
* use list of commands that should autocomplete yaml files
* make sure to chmod the autocomplete script as executable
* shellcheck and fix autocompletion of directory/sub-dirs
* more shellcheck fixes
* feat: add gemma3n cce
* feat: add sample config
* feat: add gemma3n multimodal mode
* feat: add audio example
* feat: support audio and return pixel values in collator
* feat: support unmask only assistant region (gemma3n for now)
* feat(doc): add notes for audio loading
* feat: add audio support for gemma3n
* feat: update examples
* feat: add gemma3n to the docs
* fix: add link at top
* feat(doc): clarify additional requirements
* fix: mllama missing aspect ratio
* fix: mllama need attention fixes for fa2
* Partially Revert "fix: mllama need attention fixes for fa2"
This reverts commit a0bfdd1777.
* fix: disable FA2 for mllama in vision mode
* feat: update configs to use proper attention
* fix: support other vision features
* feat(doc): clarify requirements for gemma3n
* make pad_to_sequence_len default to the same value as sample_packing
* remove duplicate validation
* fix test
* update description meta
Co-authored-by: NanoCode012 <nano@axolotl.ai>
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* limit num_proc when saving datasets to disk
* enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save
* update fixtures with dataset processes since that should never be NoneType
* improve reusability for tests
* make the initial call to tokenizer.pad not spam the console
* add guard from feedback
* make another common console output less verbose
* more logging fixes
* Added a feature to save prepared dataset in specified shards, removed limiter on multiprocessing during tokenization, and a bug fix of qwen tokenizer
* removed limiters and fixed config variable name
* black lint
* chore: lint
* feat: update handling of dataset_processes
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* Apply generic fused liger ce for unknown models
* fix deepseek liger modeling
* generic cce and config tiled mlp to use original mlp and auto detect compute params
* fix weight and lint
* update warnings
* address PR feedback
* use lookup for model class prefixes
* revert inadvertent change to flash attn verison
* remove un-needed pylint annotations
* fix import
* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* support for deepspeed autotup
* bump to latest deepspeed that supports deepcompile too
* add deepcompile support too
* fix total steps calculation for TP
* setup fixture for tp
* update ds config to ensure weights are gathered for checkpoint
* fix duplicate validation names
* chore: lint
* use cuda streams for activation offloading
* use torch native ops
* update cfg schema for streams
* fix literal constructor for set
* use context for training step so it doesn't affect evals
* disable streams
* auto gc on eval steps
* use activation_offloading config arg
* add docs for gradient checkpointing
* handle validation for gc/ao
* use cuda streams for act offloading
* add more validation for AC w/o GC
* fix docs
* move activation_offloading lower in definition so it doesn't break args/kwargs
* fix kd due to import order
* upgrade peft to 0.16.0
* upgrade datasets to 4.0.0
* refactor dupes from merge/rebase
* fix check for fsdp1 + sharded_state_dict
* use full state dict for ci
* upgrade trl==0.19.1
* add vllm for tests for grpo
* fixes to work with latest trl
* need data_parallel_size config too
* support for vllm_mode for server / colocate
* vllm settings for colocate
* relax vllm version
* bump min hf hub for latest vllm support
* add hints on string literal for vllm mode
* use latest transformers 4.53.2
* tweak acceptable loss on flaky test_ds_zero3_packed test
* don't run flaky vllm/grpo tests for now