* async grpo support
* implement data producer
* use fast async
* handle call to create data producer
* fix liger kernel setup
* fix replay buffer
* chore: lint
* make gpus go brrr
* chore: lint
* inplace div_, unwrap model for logits in bf16
* fuse selective softmax and empty cuda cache on each scoring step
* remove waiting for synch time and fix race
* make fp8 work and allow lora kernels w rl
* grpo with lora vllm sync and fixes for sharded distributed
* update docs
* more patches so it works against trl main
* address PR feedback for corerabbit
* upgrade transformers==5.3.0 trl==0.29.0 kernels
* use latest deepspeed fixes
* use corect image for cleanup
* fix test outputs for tokenizer fixes upstream
* fix import:
* keep trl at 0.28.0
* handle updated API
* use latest trl since 0.28.0 doesn't work with latest transformers
* use trl experimental for pad to length
* monkeypatch trl with ORPOTrainer so liger doesn't croak
* upgrade accelerate
* more fixes
* move patch for orpotrainer
* load the imports later
* remove use_logits_to_keep
* fix loss_type arg as a list
* fetch hf cache from s3
* just manually download the missing model for now
* lint for pre-commit update
* a few more missing models on disk
* fix: loss_type internally now list
* fix: remove deprecated code and raise deprecate
* fix: remove unneeded blocklist
* fix: remove reliance on transformers api to find package available
* chore: refactor shim for less sideeffect
* fix: silent trl experimental warning
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* upgrade transformers to 5.1.0 and torchao to 0.16.0
* upgrade trl for parity
* handle trl api changes
* orpo doesn't have max_prompt_len to check anymore
* cpoconfig doesn't take max_prompt_length and fix cpu offload
* slow fsdp1 test
* triton min 3.4.0 and liger to 0.7.0
* use transformers main for now for zero3 fix
* handle group_by_length change
* fix changes upstream
* mark skip flaky test
* use transformers latest release 5.2.0
* Prepare for transformers v5 upgrade
* fix hf cli
* update for hf hub changes
* fix tokenizer apply_chat_template args
* remap include_tokens_per_second
* fix tps
* handle migration for warmup
* use latest hf hub
* Fix scan -> ls
* fix import
* fix for renaming of mistral common tokenizer -> backend
* update for fixed tokenziation for llama
* Skip phi35 tests for now
* remove mistral patch fixed upstream in huggingface/transformers#41439
* use namespacing for patch
* don't rely on sdist for e2e tests for now
* run modal ci without waiting too
* Fix dep for ci
* fix imports
* Fix fp8 check
* fsdp2 fixes
* fix version handling
* update fsdp version tests for new v5 behavior
* Fail multigpu tests after 3 failures
* skip known v5 broken tests for now and cleanup
* bump deps
* unmark skipped test
* re-enable test_fsdp_qlora_prequant_packed test
* increase multigpu ci timeout
* skip broken gemma3 test
* reduce timout back to original 120min now that the hanging test is skipped
* fix for un-necessary collator for pretraining with bsz=1
* fix: safe_serialization deprecated in transformers v5 rc01 (#3318)
* torch_dtype deprecated
* load model in float32 for consistency with tests
* revert some test fixtures back
* use hf cache ls instead of scan
* don't strip fsdp_version
more fdsp_Version fixes for v5
fix version in fsdp_config
fix aliasing
fix fsdp_version check
check fsdp_version is 2 in both places
* Transformers v5 rc2 (#3347)
* bump dep
* use latest fbgemm, grab model config as part of fixture, un-skip test
* import AutoConfig
* don't need more problematic autoconfig when specifying config.json manually
* add fixtures for argilla ultrafeedback datasets
* download phi4-reasoning
* fix arg
* update tests for phi fast tokenizer changes
* use explicit model types for gemma3
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* fix: AutoModelForVision2Seq -> AutoModelForImageTextToText
* chore: remove duplicate
* fix: attempt fix gemma3 text mode
* chore: lint
* ga release of v5
* need property setter for name_or_path for mistral tokenizer
* vllm not compatible with transformers v5
* setter for chat_template w mistral too
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: salman <salman.mohammadi@outlook.com>
* limit num_proc when saving datasets to disk
* enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save
* update fixtures with dataset processes since that should never be NoneType
* improve reusability for tests
* support for deepspeed autotup
* bump to latest deepspeed that supports deepcompile too
* add deepcompile support too
* fix total steps calculation for TP
* setup fixture for tp
* update ds config to ensure weights are gathered for checkpoint
* fix duplicate validation names
* chore: lint
* feat: add num_proc and load from cache for rl mapping
* fix: refactor sft and rl trainer to set same base args
* feat: add report_to to set run name
* fix: consolidate handling of fp16, bf16, tf32 kwarg
* chore: consolidate eval_strat, loraplus, lr sched, max_length
* fix: deprecate old types
* fix: adding missing Any
* fix: max_steps incorrectly set
* fix: remove unnecessary datacollator kwarg insert and pop
* fix: update default max_steps
* fix: add missing weight_decay handling
* fix: ignore max_length for grpo
* feat: update CI on trainer_builder
* fix: comments
* improve handling of warmup/logging steps
* use transformers default for logging steps, not None
* fix: remove redundant override
* fix: lint
* feat: allow custom optim for rl methods
* fix: duplicate optim setting
* fix(test): set sequence_parallel_degree default in base cfg
* feat: add handling for seed and SP/ring-attn config
* chore: add back return typing from rebase
* fix(test): use RLType directly to skip needing to validate
* feat: split training builder into sub modules
* fix: remove deprecated clause
* chore: add missing config to doc
* fix: update quarto autodoc
* fix: import path for trainer builder and submodules
* fix: remove redundant configs from rebase mistake
* chore: simplify dynamo check
* fix: optimizer_cls_and_kwargs to be passed into trainer_kwargs
* fix: add missing rex from rebase
* fix: move pop optimizer_cls_and_kwargs
* fix: pop optimizer cls in rl too
* fix: leftover bug from rebase
* fix: update handling of trainer_cls in RL
* fix: address pr feedback
* feat: call hook_pre_create_trainer for rl
* chore: lint
* fix: return notimplemented for ppo
* feat: moved torch compile to base and refactor collator setting
* chore: remove unused importlib.util import
* fix: optimizer cls not being popped
* feat: move epoch setting to base
* fix: catch unhandled custom optimizer
* fix: remove duplicate lora plus setting
* chore: refactor if condition
* chore: refactor set_base_training_args into smaller modules
* fix: address TrainerBuilderBase class variables to instance var
* fix: add handling for beta3 and episilon2
* fix: change to pass dict via arg instead of updating dict
* chore: simplify if condition
* fix: force access to lr & weight decay in case not provided to early error
* fix: remove log sweep
* chore: refactor if condition
* fix: address renamed cfg
* fix: improve handling of cosine hyp
* fix: remove unused params
* chore: refactor
* chore: clarify doc safetensors
* fix: update import path to be unified following comments
* fix: duplicate kwargs passed
* feat: return separate trainer_kwargs
* chore: refactor
* chore: refactor based on comments
* chore: refactor based on comments
* fix: move gpustats callback to base
* chore: create trainer_cls_args first based on comments
* fix: ipo label smoothing passed incorrectly
* feat: add optimizer parity for RL methods with test
* feat: add parity for optimizer in RM/PRM and add test
* fix: remove redundant function override for orpo/cpo batch metrics
* fix: improve handling of dpo_label_smoothing and merge issue
* fix: test fixture returning wrong field
* fix: address avoid direct modify fixture
* chore: minor refactor
* Revert "chore: refactor"
This reverts commit 99c8859eb0.
* feat: rename trainer_builder to builders
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* fix: update chat_template
* fix: handle gemma3 showing a lot of no content for turn 0
* fix: remove unknown config from examples
* fix: test
* fix: temporary disable gemma2 test
* fix: stop overwriting config.text_config unnecessarily
* fix: handling of set cache to the text_config section
* feat: add liger gemma support and bump liger to 0.5.5
* fix: add double use_cache setting
* fix: add support for final_logit_softcap in CCE for gemma2/3
* fix: set use_cache before model load
* feat: add missing layernorm override
* fix: handle gemma3 rmsnorm
* fix: use wrapper to pass dim as hidden_size
* fix: change dim to positional
* fix: patch with wrong mlp
* chore: refactor use_cache handling
* fix import issues
* fix tests.e2e.utils import
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* hf offline decorator for tests to workaround rate limits
* fail quicker so we can see logs
* try new cache name
* limit files downloaded
* phi mini predownload
* offline decorator for phi tokenizer
* handle meta llama 8b offline too
* make sure to return fixtures if they are wrapped too
* more fixes
* more things offline
* more offline things
* fix the env var
* fix the model name
* handle gemma also
* force reload of modules to recheck offline status
* prefetch mistral too
* use reset_sessions so hub picks up offline mode
* more fixes
* rename so it doesn't seem like a context manager
* fix backoff
* switch out tinyshakespeare dataset since it runs a py script to fetch data and doesn't work offline
* include additional dataset
* more fixes
* more fixes
* replace tiny shakespeaere dataset
* skip some tests for now
* use more robust check using snapshot download to determine if a dataset name is on the hub
* typo for skip reason
* use local_files_only
* more fixtures
* remove local only
* use tiny shakespeare as pretrain dataset and streaming can't be offline even if precached
* make sure fixtures aren't offline
improve the offline reset
try bumping version of datasets
reorder reloading and setting
prime a new cache
run the tests now with fresh cache
try with a static cache
* now run all the ci again with hopefully a correct cache
* skip wonky tests for now
* skip wonky tests for now
* handle offline mode for model card creation
* add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests
* log slowest tests
* pin pynvml==11.5.3
* fix load local hub path
* optimize for speed w smaller models and val_set_size
* replace pynvml
* make the resume from checkpoint e2e faster
* make tests smaller
* wip add new proposed message structure
* tokenization
* wip
* wip transform builder
* wip make the chat dataset loadable
* wip chatml + llama 3 new chat objects
* chore: lint
* chore: lint
* fix tokenization
* remove dacite dependency since we're using pydantic now
* fix handling when already correctly split in messages
* make sure to remove chat features from tokenized ds
* move chat to be a input transform for messages
* make sure llama3 has the bos token
* remove non-working special token code
* fix messages strat loader
* WIP use trl ORPOTrainer
* fixes to make orpo work with trl
* fix the chat template laoding
* make sure to handle the special tokens and add_generation for assistant turn too
* ipo-dpo trainer
* fix missing abstract method
* chatml template, grad checkpointing kwargs support
* fix steps calc for RL and add dataloader kwargs
* wip to fix dpo and start ppo
* more fixes
* refactor to generalize map fn
* fix dataset loop and handle argilla pref dataset
* set training args
* load reference model on seperate gpu if more than one device
* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo
* fixes for rl training
* support for ipo from yaml
* set dpo training args from the config, add tests
* chore: lint
* set sequence_len for model in test
* add RLHF docs