* feat: add gemma3n cce
* feat: add sample config
* feat: add gemma3n multimodal mode
* feat: add audio example
* feat: support audio and return pixel values in collator
* feat: support unmask only assistant region (gemma3n for now)
* feat(doc): add notes for audio loading
* feat: add audio support for gemma3n
* feat: update examples
* feat: add gemma3n to the docs
* fix: add link at top
* feat(doc): clarify additional requirements
* fix: mllama missing aspect ratio
* fix: mllama need attention fixes for fa2
* Partially Revert "fix: mllama need attention fixes for fa2"
This reverts commit a0bfdd1777.
* fix: disable FA2 for mllama in vision mode
* feat: update configs to use proper attention
* fix: support other vision features
* feat(doc): clarify requirements for gemma3n
* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* bump hf deps
* upgrade liger-kernel too
* install cce from fork for transformers fix
* fix reference to vocab size in gemma3 patch
* use padding_idx instead of pad_token_id
* remove fixed gemma3 patch
* use updated cce fork
* fix local mllama cce patches w docstring
* add test for multipack with trainer setup and fix trainer for trainer refactor upstream
* bump modal version
* guard for iterable datasetS
* mllama model arch layout changed in latest transformers
* fix batch sampler with drop_last
* fix: address upstream vlm changes for lora
* fix: update references to old lora target path
* fix: remove mllama fa2 patch due to upstream fix
* fix: lora kernel patch path for multimodal models
* fix: removed mllama from quarto
* run test for came optim on 2.6.0+
* fix fsdp2 patch and remove deprecated patch
* make sure to set sequence_parallel_degree for grpo
* Add SP test for GRPO
* add sp to grpo config for trainer
* use reward_funcs as kwarg to grpo trainer
* fix the comprehension for reward funcs
* reward funcs already passed in as args
* init sp_group right before training
* fix check for adding models to SP context
* make sure to pass args to super
* upgrade deepspeed
* use updated trl and add reasoning flags for vllm
* patch the worker
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>