axolotl

Author	SHA1	Message	Date
Wing Lian	56162f71db	monkeypatch fix for fsdp with cpu ram efficient loading (#3464 ) [skip ci]	2026-03-06 09:10:58 -05:00
github-actions[bot]	6c44afaea1	chore: update pre-commit hooks (#3381 ) [skip ci] Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com>	2026-03-05 21:39:34 -05:00
Wing Lian	234931d512	extend pytest-sdist timeout to 30 min for slow/flaky tests (#3456 ) [skip ci] * extend pytest-sdist timeout to 30 min for slow/flaky tests * Also preload the cdn cache so it doesn't get stampeded * fix yaml syntax * missing fields * can't pipe to dev/null * Fix nightlies and add 2.10.0 to multi-gpu suite	2026-03-05 15:04:38 -05:00
NanoCode012	6a8baf8fa7	feat: add sonicmoe (#3411 ) * feat: add sonicmoe * feat: add torch compile for routing * feat: add routing smoke test * feat: add qwen3_5_moe, qwen3_vl_moe, qwen3_omni_moe * fix: disable mlp kernel for sonicmoe too * feat: update to sonicmoe release * chore: update import following new sonicmoe changes * feat: update handling for blackwell * feat: add sonicmoe e2e test * fix: installation for updated sonicmoe * fix: git commit * fix: ignore py req and fix metadata * fix: increase min hidden size to match sonicmoe kernel min * fix: attempt properly interleave and handle unpatch mid-test * chore: refactor teardown better * chore: refactor to re-use rearrange * fix: add idempotency guard * fix: address comments on CI memory and interleave * fix: tests grad, param doublewrapped	2026-03-05 13:43:31 -05:00
VED	1eaf4d7418	add: support mxfp4 axo (#3375 ) * mxfp4 axo * import lint * test for qat mxfp4 * config for mxfp4 * add qat: * pass base config * MXFakeQuantizeConfig * lint * tune config so it fits in 32GB VRAM --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-03-05 13:40:45 -05:00
Gilles Turpin	4b8bc52424	fix: correct total_num_steps and batch_size calculation with context parallelism (#3444 ) * fix: correct total_num_steps and batch_size calculation with context parallelism * feat: add test for CP batch size --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-05 12:33:28 -05:00
Wing Lian	28cc085283	include number of params and rounded est of params so we can easily group in posthog (#3455 ) * include number of params and rounded est of params so we can easily group in posthog * fix typing	2026-03-05 12:31:17 -05:00
bekk02	8e2a102cca	Fix FSDP2 sharding and validate AO version for LR groups (#3403 ) * Fix fsdp2 sharding. Fix validation of ao version for lr groups * remove validation since axolotl requires ao>0.13.0 already * Move fully_shard of entire module for lora_embedding_A/B out of loop * chore: lint --------- Co-authored-by: bekk02 <ID+bekk02@users.noreply.github.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-03-05 09:59:32 -05:00
NanoCode012	753906cfc7	feat: add doc for expert quantization, glm45 air example configs, and update readme for release (#3452 ) [skip ci] * chore: rename without period * feat: add glm45 air * feat: add doc on expert quantization * feat: update base readme with new changes * chore: cleanup * chore: cleanup * chore: cleanup * fix: disable quantize_moe_expert on merge per comment * chore: add kernel info to optimizations doc	2026-03-05 09:58:09 -05:00
Wing Lian	b6b8db805a	fix python version typo for building 3.11 (#3454 )	2026-03-04 09:53:35 -05:00
Wing Lian	653f90be25	Add torch 2.10.0 to unit tests and use python 3.14 (#3450 ) * Add torch 2.10.0 to unit tests and use python 3.14 * hold on python 3.14 checks due to mistral common * add base option to matrix	2026-03-03 13:01:52 -05:00
NanoCode012	945c8aeb10	Fix: quantize and target moe layers in transformers v5 for adapters and many misc fixes (#3439 ) * fix: saving clones state dict * fix: apply fix for only CP mode * fix: add dropout check when using lora target param * fix: re-add patch from transformers PR #39866 * feat: add moe quant to test by ved * fix: try match target param properly end with * fix: clear cache per param quant * fix: attempt on-load quantize experts instead of post-load * fix: attempt disable async load * chore: add log * chore: adjust log * fix: remove cuda alloc for moe and enable async load * chore: remove leftover logs * chore: add extra empty cache * fix(doc): clarify support * fix: handle fsdp2 for paramwrapper dtensor * feat: attempt to quant experts in 8bit mode too * feat: attempt to release bf16 experts from vram * feat: upgrade cce * fix: fsdp2 init_sharded_param load int8/uint4 dtensor as require_grad=true on init * fix: remove unnecessary gc and empty cache * Revert "fix: remove unnecessary gc and empty cache" This reverts commit `1d54518990`. * fix: do not call full_tensor on non-dtensors * fix: attempt to address fsdp2 with quant exp high loss * fix: attempt lora quant experts wrong dim * fix: ensure require_grad patch applied for lora 8bit * fix: attempt lora 8bit fsdp2 * fix: attribute access on save for lora 8bit fsdp2 * fix: wrong weight attrib access * chore(refactor): add config, re-arrange position of patches, clean comments * feat: add example docs * chore: cherry pick trinity fixes from PR 3399 * chore: comments refactor; add guards * fix: guard using wrong key * fix: mamba save does not accept main process param * fix: guard prevent double hook * fix: move gc to upper scope * chore: add comment on proxy forward patch * fix: add comment to clarify * feat: add test idempotency * fix: AttributeError: `e_score_correction_bias` is not an nn.Parameter * fix: AttributeError: 'NoneType' object has no attribute 'to' * fix: update docs on cpu_ram_efficient_loading	2026-03-03 10:06:23 -05:00
NanoCode012	e672d37f33	fix: qwen3-next to use fla causal-conv1d to support packing (#3437 * fix: qwen3-next to use fla causal-conv1d to support packing * fix: causal import and update doc for v5 * fix: hard fail for packing without fla	2026-03-03 09:26:46 -05:00
Wing Lian	77828d3559	uv cloud image should use uv w pip (#3449 )	2026-03-02 16:39:26 -05:00
Wing Lian	4272817109	don't install torch ao on arm64 (#3448 )	2026-03-02 14:24:54 -05:00
Manas Vardhan	474208b794	fix: Save de-duplicated dataset during pre-processing (#3427 ) * fix: run deduplication before saving dataset during preprocessing Move deduplicate_and_log_datasets call before save_preprocessed_dataset in both SFT and RL data loading pipelines. This ensures the saved preprocessed dataset is already de-duplicated, so subsequent loads from cache don't contain duplicates. Fixes #2719 * fix: include deduplication flag in dataset hash and warn on skip_prepare_dataset+dedup - Add dataset_exact_deduplication to the hash string in generate_dataset_hash_from_config so cached datasets are invalidated when the dedup setting changes. - Log a warning when skip_prepare_dataset=True and dataset_exact_deduplication=True, since dedup will be silently skipped in that configuration (both SFT and RL paths). * fix: add ValueError for skip_prepare+dedup, fix test mock target and formatting - Add config validator (check_deduplication_with_skip_prepare) that raises ValueError when skip_prepare_dataset=True and dataset_exact_deduplication=True - Replace runtime warnings in sft.py/rl.py with the validator check - Fix RL test: patch axolotl.utils.data.rl.load_tokenizer instead of axolotl.loaders.load_tokenizer to properly mock the imported reference - Fix ruff lint (remove unused imports) and formatting issues * refactor: inline deduplicate function per review feedback * fix test fixture, lint --------- Co-authored-by: ManasVardhan <manasvardhan@users.noreply.github.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-03-02 12:55:59 -05:00
Wing Lian	444020b332	mark slow tests that are timing out in CI (#3428 ) [skip ci]	2026-03-02 12:26:30 -05:00
Wing Lian	aa88c2e30b	fix uv cache subcommand (#3447 )	2026-03-02 12:26:08 -05:00
NanoCode012	f447bce1db	fix: do not push telemetry on non-master rank (#3438 )	2026-03-02 15:31:20 +07:00
kallewoof	7f23b302d1	bug-fix: use self.optimizer if optimizer not passed to SchedulerMixin.create_scheduler() (#3435 ) [skip ci] * bug-fix: use self.optimizer if optimizer not passed to SchedulerMixin.create_scheduler() * nit: raise if self.optimizer is also unset * optimizer properly optional in create_scheduler()	2026-03-02 15:30:07 +07:00
Wing Lian	18f26c19ef	add uv axolotl builds (#3431 )	2026-02-25 14:46:02 -05:00
Robert Ronan	2b6f4a6c9b	Fix: excess_length_strategy truncation method (#3401 ) * Add test cases to verify that the problem exists in the underlying * Update the handle_long_sequences function to correctly use Map instead of filter for the truncation strategy. Also remove the minimal length filtering from the truncate_long_samples function, and run it separately and before. * fix: refactor and add test truncate for non-input id fields * fix: refactor long seq handling fn * fix: refactor duplicate fn and simplify route * add additional tests and make them work on mac * handle logging exception on empty datasets --------- Co-authored-by: 2ndset bot <bot@2ndset.ai> Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-02-25 11:31:11 +07:00
madScientist10	8f54b4eb25	fix: pass revision parameter to tokenizer and processor loaders (#3388 ) [skip ci] * fix: pass revision parameter to tokenizer and processor loaders * fix: address revision=None passed to .from_pretrained * add tests and address review feedback for revision parameter - Reformat modify_tokenizer_files signature and from_pretrained call - Use kwargs pattern for modify_tokenizer_files call to avoid passing None revision - Add 6 unit tests for revision parameter in tokenizer/processor loaders --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-02-25 11:11:20 +07:00
VED	a131e4d0e5	sample gen support sft (#3240 ) [skip ci] * add:parameters + callback * sft core + logging * indentation fix * logger fix * loger fix in sft * gen sample on eval * lint * deprecation	2026-02-25 11:10:57 +07:00
Wing Lian	1791d87b6f	build axolotl images with torch 2.10.0 (#3430 )	2026-02-24 22:35:25 -05:00
Wing Lian	b40803da51	build base images for torch 2.10.0 (#3429 )	2026-02-24 20:32:34 -05:00
Wing Lian	68f1b7004c	ScatterMoE LoRA support (#3410 ) * scattermoe lora support * fsdp, bf16, dim fixes * expert weights aren't needed in save for bwd since they are frozen * use sonicmoe optim options * update save model from upstream * fixes per code review feedback and add tests * revert removal of CP fix * misc fixes	2026-02-24 14:59:55 -05:00
NanoCode012	08441fed17	fix: set allowed values for `adapter` config (#3415 )	2026-02-23 11:39:53 -05:00
NanoCode012	86ca1e27c0	fix: update MistralProcessor to be v5 compat (#3423 ) * fix: update MistralProcessor to be v5 compat * feat: add test for mistral3 processor * chore: comment	2026-02-23 11:39:13 -05:00
Manas Vardhan	5ed455715e	feat: support dot-notation CLI args for nested config options (#3419 ) * feat: support dot-notation CLI args for nested config options Add support for overriding nested config fields (like TRL config) via CLI using dot-notation, e.g.: axolotl train grpo.yaml --trl.vllm-server-host=10.0.0.1 --trl.beta=0.1 Changes: - args.py: Detect BaseModel subclass fields and generate dot-notation CLI options (--parent.child) that map to double-underscore kwargs (parent__child). Also fix _strip_optional_type for Python 3.10+ union syntax (X \| None). - config.py: Handle double-underscore kwargs in load_cfg by setting nested dict values on the config. - Add tests for nested option handling. Fixes #2702 * Address CodeRabbit review: fix string parent bug, add type hints and docstring Signed-off-by: Manas Vardhan <manasvardhan@gmail.com> * Add type coercion for CLI kwargs and fix pre-commit issues - Add _coerce_value() for YAML-style type inference on string CLI args - When existing config value has a type (int/float/bool), cast to match - When no existing value, infer type from string (true/false, ints, floats, null) - Apply coercion to both flat and nested (dot-notation) kwargs - Fix unused pytest import (pre-commit/ruff) - Update tests to pass string values (matching real CLI behavior) - Add dedicated TestCoerceValue test class Addresses maintainer feedback on type casting for nested kwargs. --------- Signed-off-by: Manas Vardhan <manasvardhan@gmail.com>	2026-02-23 10:10:06 -05:00
Lorenzo Baraldi	3f30572d4a	Fix typo in dataset_processes field (#3426 ) * Fix typo in dataset_processes field * fix: use updated config name --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-02-23 14:18:37 +07:00
NanoCode012	43d60c7439	bump cut-cross-entropy to 58d6572 (#3424 )	2026-02-20 14:24:51 -05:00
Wing Lian	0ea252d392	update to trackio 0.16.1 (#3425 ) [skip ci]	2026-02-20 14:24:33 -05:00
Wing Lian	29722dec60	use bunnycdn for CI assets (#3422 ) [skip ci]	2026-02-20 00:09:25 -05:00
NanoCode012	7fbedbd300	fix(doc): add limitation for unfrozen_parameters (#3416 )	2026-02-19 18:32:26 -05:00
Wing Lian	145ffc9be1	upgrade transformers to 5.2.0 and torchao to 0.16.0 (#3407 ) * upgrade transformers to 5.1.0 and torchao to 0.16.0 * upgrade trl for parity * handle trl api changes * orpo doesn't have max_prompt_len to check anymore * cpoconfig doesn't take max_prompt_length and fix cpu offload * slow fsdp1 test * triton min 3.4.0 and liger to 0.7.0 * use transformers main for now for zero3 fix * handle group_by_length change * fix changes upstream * mark skip flaky test * use transformers latest release 5.2.0	2026-02-19 18:27:27 -05:00
NanoCode012	4f1b5ad29f	fix: clarify how to use lm_eval plugin (#3404 ) [skip ci]	2026-02-15 07:52:30 -05:00
NanoCode012	d6a2532dd7	feat(doc): clarify how to use scattermoe (#3408 ) [skip ci] * feat(doc): clarify how to use scattermoe * chore: fix wording	2026-02-15 07:51:28 -05:00
Wing Lian	5eb265513c	fix generic patch for cce (#3405 )	2026-02-12 08:58:04 -05:00
NanoCode012	06ac407b92	feat: improve telemetry log (#3398 ) * fix: redact trackio and data_files * fix: add new orgs to whitelist * feat: add run id to logs for users to easily share * fix: update to add more metrics * fix: add missed experiment tracker * chore: formatting in main	2026-02-10 23:01:34 +07:00
NanoCode012	4e22cf0651	fix: remove telemetry warning (#3397 ) [skip ci]	2026-02-10 23:01:16 +07:00
VED	a4ee56c315	fix: set rollout in GRPO training_kwargs (#3392 )	2026-02-10 18:06:15 +07:00
NanoCode012	c67cbcb0f5	fix: ignore add_special_tokens and use test mode for generation for mistral tokenizer (#3396 ) [skip ci] * fix: ignore add_special_tokens and use test mode for generation * fix: incorrectly setting kwarg	2026-02-10 18:03:26 +07:00
NanoCode012	a2da852576	fix: improve lora kernels failure message and handle trust_remote_code (#3378 ) [skip ci] * fix: improve lora kernels failure message and handle trust_remote_code * chore: re-order model guides	2026-02-10 17:58:40 +07:00
madScientist10	37e9da7a53	add hub_revision support for specifying branch when pushing checkpoints (#3387 ) [skip ci]	2026-02-10 17:53:09 +07:00
NanoCode012	ed7105dba7	fix: GRPO config not accept max_prompt_length (#3390 ) [skip ci]	2026-02-10 17:52:09 +07:00
NanoCode012	b6d3653f74	feat: add step3p5 for cce (#3384 ) [skip ci] * feat: add step3p5 for cce * chore: reorder model	2026-02-10 17:51:43 +07:00
NanoCode012	fcc4cfdb63	feat: add sageattention (#2823 ) [skip ci] * feat: add sageattention * feat: call path on pre model load * fix: patch to use register to correct var * fix: add strict check import at start * chore: fix comments * chore: refactor * feat: add capability check * fix: missed underscore * fix: let sageattention use FA backend in transformers * feat: update sage attention for attention mask and position ids * feat: allow sample packing but add warning without packing * fix: loss hitting 0 with packing and attention mask note * feat: downcast embeds if sage attention too * feat: add config validation * feat: add attention docs * chore: docs	2026-02-10 17:49:21 +07:00
VED	97a4f28511	fix: saving state dict and eval for Context Parallel (#3382 ) [skip ci] * clone state_dict if none * patch calculating eval loss for cp	2026-02-10 17:47:26 +07:00
VED	86a5803212	train_per_sec_per_gpu metric (#3364 ) [skip ci] * fix token count * guard for none n zero	2026-02-10 17:44:55 +07:00

1 2 3 4 5 ...

2600 Commits