* kd fixes
* fix collator setup
* fix input args
* better handling to drop string fields for kd with raw dataset
* kd trainer has kd temp as part of the init
* drop top_k before softmax
* simplfy and remove zscore
* WIP chunked KD loss with autograd wrapper
* more fixes and liger-type chunked loss
* collator cls for plugins
* remove debugging
* additional plugin collator kwargs, don't scale up kd loss by t^2
* don't need temp arg to distill method
* online kd wip
* add close to comment block
* suport sampling params/max new tokens
* handle when no custom collator is used in plugins
* logsumexp trick:
* fix check
* shift off the first empty token
* fix length of padding
* use max not min
* temp scale kd loss at end
* support for dynamic plugin training args mixins and symmetric kl
* chore: lint
* fix trainer callback base class
* Fix decay
* accept compressed responses for smaller wire payload
* post-rebase lint
* more KD updates
* increase hyperparams_count for gradients for added normalize_topk
* fix to remove attention_mask
* rename vars for consistency
* fix rebase issues
* default to dropping last batch in multipack batch sampler
* improve handling of train len
* init collator_cls_and_kwargs
* explicit drop_last=False when checking for multipack completeness
* use separate v2 loader for kd
* fix kd tests to use subprocess so it picks up kd training args
* default value for kd_beta arg
* use updated dataset for ci
* longer timeout for e2e
* fix: do not pre-patch self attention if lora dropout non-zero
* fix: add test to check patch not applied
* fix: test
* fix: test config check
* fix where we check so that tests don't break
* fix: test
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* bump hf deps
* upgrade liger-kernel too
* install cce from fork for transformers fix
* fix reference to vocab size in gemma3 patch
* use padding_idx instead of pad_token_id
* remove fixed gemma3 patch
* use updated cce fork
* fix local mllama cce patches w docstring
* add test for multipack with trainer setup and fix trainer for trainer refactor upstream
* bump modal version
* guard for iterable datasetS
* mllama model arch layout changed in latest transformers
* fix batch sampler with drop_last
* fix: address upstream vlm changes for lora
* fix: update references to old lora target path
* fix: remove mllama fa2 patch due to upstream fix
* fix: lora kernel patch path for multimodal models
* fix: removed mllama from quarto
* run test for came optim on 2.6.0+
* fix fsdp2 patch and remove deprecated patch
* make sure to set sequence_parallel_degree for grpo
* Add SP test for GRPO
* add sp to grpo config for trainer
* use reward_funcs as kwarg to grpo trainer
* fix the comprehension for reward funcs
* reward funcs already passed in as args
* init sp_group right before training
* fix check for adding models to SP context
* make sure to pass args to super
* upgrade deepspeed
* use updated trl and add reasoning flags for vllm
* patch the worker
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* feat: add num_proc and load from cache for rl mapping
* fix: refactor sft and rl trainer to set same base args
* feat: add report_to to set run name
* fix: consolidate handling of fp16, bf16, tf32 kwarg
* chore: consolidate eval_strat, loraplus, lr sched, max_length
* fix: deprecate old types
* fix: adding missing Any
* fix: max_steps incorrectly set
* fix: remove unnecessary datacollator kwarg insert and pop
* fix: update default max_steps
* fix: add missing weight_decay handling
* fix: ignore max_length for grpo
* feat: update CI on trainer_builder
* fix: comments
* improve handling of warmup/logging steps
* use transformers default for logging steps, not None
* fix: remove redundant override
* fix: lint
* feat: allow custom optim for rl methods
* fix: duplicate optim setting
* fix(test): set sequence_parallel_degree default in base cfg
* feat: add handling for seed and SP/ring-attn config
* chore: add back return typing from rebase
* fix(test): use RLType directly to skip needing to validate
* feat: split training builder into sub modules
* fix: remove deprecated clause
* chore: add missing config to doc
* fix: update quarto autodoc
* fix: import path for trainer builder and submodules
* fix: remove redundant configs from rebase mistake
* chore: simplify dynamo check
* fix: optimizer_cls_and_kwargs to be passed into trainer_kwargs
* fix: add missing rex from rebase
* fix: move pop optimizer_cls_and_kwargs
* fix: pop optimizer cls in rl too
* fix: leftover bug from rebase
* fix: update handling of trainer_cls in RL
* fix: address pr feedback
* feat: call hook_pre_create_trainer for rl
* chore: lint
* fix: return notimplemented for ppo
* feat: moved torch compile to base and refactor collator setting
* chore: remove unused importlib.util import
* fix: optimizer cls not being popped
* feat: move epoch setting to base
* fix: catch unhandled custom optimizer
* fix: remove duplicate lora plus setting
* chore: refactor if condition
* chore: refactor set_base_training_args into smaller modules
* fix: address TrainerBuilderBase class variables to instance var
* fix: add handling for beta3 and episilon2
* fix: change to pass dict via arg instead of updating dict
* chore: simplify if condition
* fix: force access to lr & weight decay in case not provided to early error
* fix: remove log sweep
* chore: refactor if condition
* fix: address renamed cfg
* fix: improve handling of cosine hyp
* fix: remove unused params
* chore: refactor
* chore: clarify doc safetensors
* fix: update import path to be unified following comments
* fix: duplicate kwargs passed
* feat: return separate trainer_kwargs
* chore: refactor
* chore: refactor based on comments
* chore: refactor based on comments
* fix: move gpustats callback to base
* chore: create trainer_cls_args first based on comments
* fix: ipo label smoothing passed incorrectly
* feat: add optimizer parity for RL methods with test
* feat: add parity for optimizer in RM/PRM and add test
* fix: remove redundant function override for orpo/cpo batch metrics
* fix: improve handling of dpo_label_smoothing and merge issue
* fix: test fixture returning wrong field
* fix: address avoid direct modify fixture
* chore: minor refactor
* Revert "chore: refactor"
This reverts commit 99c8859eb0.
* feat: rename trainer_builder to builders
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* don't set peft_config on grpo to prevent double peft wrap
* remove overrides needed to support bug
* fix grpo tests
* require more CPU for multigpu to help with torch compile for vllm
* offload activations to disk instead of CPU RAM
* add prefetch
* Disco :dance:
* include offload_disk in e2e test for AC
* document and make sure to cleanup
* fix annotation to match docs
* fix docs build
* address PR feedback
* update doc and skip brittle grpo test
* fix the path to run the multigpu tests
* increase timeout, use LOC instead of NVL
* typo
* use hf cache from s3 backed cloudfront
* mark grpo as flaky test dues to vllm start
* lean mistral ft tests, remove e2e torch 2.4.1 test
* make sure to pass save_only_model for RL
* more tests to make ci leaner, add cleanup to modal ci
* fix module for import in e2e tests
* use mp spawn to prevent deadlocks with packing
* make sure cleanup shell script is executable when cloned out
* fsdp embeddings should be float32 per comment
* patch peft to not upcast everything
* add tabs back to code check
* fix import
* add configurable option and fix check
* add check for dtypes
* move embeddings test to patch dir
* fix test
* fix comment and logic
* Add: SFTPlugin with llmcompressor
* Update: review comments!
* Add:llmcompressor instalable
* pre commit hooks
* Use: warning over warn
* Revert: TODO's
* Update llmcompressor version to latest
* Apply suggestions from @markurtz
Co-authored-by: Mark Kurtz <mark.j.kurtz@gmail.com>
* Address review comments from @markurtz
* Add: llcompressor installable
* Rename: sft.yaml to sparse-finetuning.yaml
* Use: absolute import
* Update model config
* Move: LLMCompressorPlugin into it's own submodule
* Add: `llm_compressor` integration documentation
* Rebase and updates!
* Tests, Style, Updates
* Add: .qmd file
* Address Review Comments:
* deleted redundant docs/llm_compressor.qmd
* incorporated feedback in integration README.md
* added llmcompressor integration to docs/custom_integrations.qmd
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* Add: line about further optimizations using llmcompressor
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* Apply patch from @winglian
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* Fix: Test
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* additional fixes for docker and saving compressed
* split llmcompressor from vllm checks
* Reset session between tests
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* move decorator to test method instead of class
* make sure to reset the session after each test
* move import of llmcompressor to reset session inside test
---------
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* update trl to 0.17.0
* grpo + vllm no longer supported with 2.5.1 due to vllm constraints
* disable VLLM_USE_V1 for ci
* imporve handle killing off of multiprocessing vllm service
* debug why this doesn't run in CI
* increase vllm wait time
* increase timeout to 5min
* upgrade to vllm 0.8.4
* dump out the vllm log for debugging
* use debug logging
* increase vllm start timeout
* use NVL instead
* disable torch compile cache
* revert some commented checks now that grpo tests are fixed
* increase vllm timeoout back to 5min
* add e2e smoke test for using activation/gradient checkpointing with offload
* disable duplicate code check for the test
* fix relative import
* seq len too small to test this dataset with packing
* Fix checkpoint ptaching for tests
* make sure to validate the config before normalizing so defaults get set
* validation not needed for particular test
* remove duplicate validations
* set qlora correctly
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* removing pad_to_sequence_len=False for now
* fix
* updating docs to include batch SP
* review comments
* fixes for batch API funcs, simplify
* fixes
* fix
* updates
* add batch_zigzag smoke test
* fixes for delinearization, and make qlora work with fsdp2
* Add back mistakenly removed lm_eval
* typo [skip ci]
* patch evals for torch.compile + fsdp2
* also check torch_compile w fsdp2
* lots of fixes for flex attn with llama4
* fix patch check and patch llama4 too
* attempt to make the patches stick
* use transformers 4.51.2
* update configs and README for llama4
* remove torch.compile for CI test
* cleanup any existing singletons
* set singleton cache to None instead of deleting
* use importlib reload with monkeypatch
* don't worry about transformers version, mark inputs with grads, fix regex
* make sure embeds aren't on cpu
* logging and mem improvements
* vllm version and add to docker, make sure to save processor on conversion
* fix ambiguous tensor bool check
* fix vllm to not use v1, upgrade hf transformers
* fix tests
* make flex_attn_compile_kwargs configurable, since this depends on model params
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>