* bump hf deps
* upgrade liger-kernel too
* install cce from fork for transformers fix
* fix reference to vocab size in gemma3 patch
* use padding_idx instead of pad_token_id
* remove fixed gemma3 patch
* use updated cce fork
* fix local mllama cce patches w docstring
* add test for multipack with trainer setup and fix trainer for trainer refactor upstream
* bump modal version
* guard for iterable datasetS
* mllama model arch layout changed in latest transformers
* fix batch sampler with drop_last
* fix: address upstream vlm changes for lora
* fix: update references to old lora target path
* fix: remove mllama fa2 patch due to upstream fix
* fix: lora kernel patch path for multimodal models
* fix: removed mllama from quarto
* run test for came optim on 2.6.0+
* fix fsdp2 patch and remove deprecated patch
* make sure to set sequence_parallel_degree for grpo
* Add SP test for GRPO
* add sp to grpo config for trainer
* use reward_funcs as kwarg to grpo trainer
* fix the comprehension for reward funcs
* reward funcs already passed in as args
* init sp_group right before training
* fix check for adding models to SP context
* make sure to pass args to super
* upgrade deepspeed
* use updated trl and add reasoning flags for vllm
* patch the worker
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>