* extend pytest-sdist timeout to 30 min for slow/flaky tests
* Also preload the cdn cache so it doesn't get stampeded
* fix yaml syntax
* missing fields
* can't pipe to dev/null
* Fix nightlies and add 2.10.0 to multi-gpu suite
* mxfp4 axo
* import lint
* test for qat mxfp4
* config for mxfp4
* add qat:
* pass base config
* MXFakeQuantizeConfig
* lint
* tune config so it fits in 32GB VRAM
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* Fix fsdp2 sharding. Fix validation of ao version for lr groups
* remove validation since axolotl requires ao>0.13.0 already
* Move fully_shard of entire module for lora_embedding_A/B out of loop
* chore: lint
---------
Co-authored-by: bekk02 <ID+bekk02@users.noreply.github.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* chore: rename without period
* feat: add glm45 air
* feat: add doc on expert quantization
* feat: update base readme with new changes
* chore: cleanup
* chore: cleanup
* chore: cleanup
* fix: disable quantize_moe_expert on merge per comment
* chore: add kernel info to optimizations doc
* fix: run deduplication before saving dataset during preprocessing
Move deduplicate_and_log_datasets call before save_preprocessed_dataset
in both SFT and RL data loading pipelines. This ensures the saved
preprocessed dataset is already de-duplicated, so subsequent loads
from cache don't contain duplicates.
Fixes#2719
* fix: include deduplication flag in dataset hash and warn on skip_prepare_dataset+dedup
- Add dataset_exact_deduplication to the hash string in
generate_dataset_hash_from_config so cached datasets are invalidated
when the dedup setting changes.
- Log a warning when skip_prepare_dataset=True and
dataset_exact_deduplication=True, since dedup will be silently
skipped in that configuration (both SFT and RL paths).
* fix: add ValueError for skip_prepare+dedup, fix test mock target and formatting
- Add config validator (check_deduplication_with_skip_prepare) that raises
ValueError when skip_prepare_dataset=True and dataset_exact_deduplication=True
- Replace runtime warnings in sft.py/rl.py with the validator check
- Fix RL test: patch axolotl.utils.data.rl.load_tokenizer instead of
axolotl.loaders.load_tokenizer to properly mock the imported reference
- Fix ruff lint (remove unused imports) and formatting issues
* refactor: inline deduplicate function per review feedback
* fix test fixture, lint
---------
Co-authored-by: ManasVardhan <manasvardhan@users.noreply.github.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* bug-fix: use self.optimizer if optimizer not passed to SchedulerMixin.create_scheduler()
* nit: raise if self.optimizer is also unset
* optimizer properly optional in create_scheduler()
* Add test cases to verify that the problem exists in the underlying
* Update the handle_long_sequences function to correctly use Map instead of filter for the truncation strategy. Also remove the minimal length filtering from the truncate_long_samples function, and run it separately and before.
* fix: refactor and add test truncate for non-input id fields
* fix: refactor long seq handling fn
* fix: refactor duplicate fn and simplify route
* add additional tests and make them work on mac
* handle logging exception on empty datasets
---------
Co-authored-by: 2ndset bot <bot@2ndset.ai>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* scattermoe lora support
* fsdp, bf16, dim fixes
* expert weights aren't needed in save for bwd since they are frozen
* use sonicmoe optim options
* update save model from upstream
* fixes per code review feedback and add tests
* revert removal of CP fix
* misc fixes
* feat: support dot-notation CLI args for nested config options
Add support for overriding nested config fields (like TRL config) via
CLI using dot-notation, e.g.:
axolotl train grpo.yaml --trl.vllm-server-host=10.0.0.1 --trl.beta=0.1
Changes:
- args.py: Detect BaseModel subclass fields and generate dot-notation
CLI options (--parent.child) that map to double-underscore kwargs
(parent__child). Also fix _strip_optional_type for Python 3.10+
union syntax (X | None).
- config.py: Handle double-underscore kwargs in load_cfg by setting
nested dict values on the config.
- Add tests for nested option handling.
Fixes#2702
* Address CodeRabbit review: fix string parent bug, add type hints and docstring
Signed-off-by: Manas Vardhan <manasvardhan@gmail.com>
* Add type coercion for CLI kwargs and fix pre-commit issues
- Add _coerce_value() for YAML-style type inference on string CLI args
- When existing config value has a type (int/float/bool), cast to match
- When no existing value, infer type from string (true/false, ints, floats, null)
- Apply coercion to both flat and nested (dot-notation) kwargs
- Fix unused pytest import (pre-commit/ruff)
- Update tests to pass string values (matching real CLI behavior)
- Add dedicated TestCoerceValue test class
Addresses maintainer feedback on type casting for nested kwargs.
---------
Signed-off-by: Manas Vardhan <manasvardhan@gmail.com>
* upgrade transformers to 5.1.0 and torchao to 0.16.0
* upgrade trl for parity
* handle trl api changes
* orpo doesn't have max_prompt_len to check anymore
* cpoconfig doesn't take max_prompt_length and fix cpu offload
* slow fsdp1 test
* triton min 3.4.0 and liger to 0.7.0
* use transformers main for now for zero3 fix
* handle group_by_length change
* fix changes upstream
* mark skip flaky test
* use transformers latest release 5.2.0
* fix: redact trackio and data_files
* fix: add new orgs to whitelist
* feat: add run id to logs for users to easily share
* fix: update to add more metrics
* fix: add missed experiment tracker
* chore: formatting in main
* feat: add sageattention
* feat: call path on pre model load
* fix: patch to use register to correct var
* fix: add strict check import at start
* chore: fix comments
* chore: refactor
* feat: add capability check
* fix: missed underscore
* fix: let sageattention use FA backend in transformers
* feat: update sage attention for attention mask and position ids
* feat: allow sample packing but add warning without packing
* fix: loss hitting 0 with packing and attention mask note
* feat: downcast embeds if sage attention too
* feat: add config validation
* feat: add attention docs
* chore: docs