axolotl

Author	SHA1	Message	Date
Wing Lian	2c05847a5f	reduce autotune search space (#3525 ) [skip ci] * reduce autotune search space * consistent docstrings	2026-03-21 18:30:15 -04:00
Wing Lian	b0294b3427	handle qwen3.5 moe loading (#3523 ) [skip ci]	2026-03-20 09:25:16 -04:00
Avaya Aggarwal	1bcfc08c90	feat: add support and end-to-end tests for multiple custom optimizers… (#3457 ) [skip ci] * feat: add support and end-to-end tests for multiple custom optimizers including Optimi AdamW, ADOPT AdamW, Muon, Dion, Schedule-Free AdamW, CAME PyTorch, and Flash AdamW. * feat: Add standalone flashoptim integration test and E2E tests for various custom optimizers including FlashAdamW, FlashAdam, FlashSGD, FlashSGDW, FlashLion, optimi_adamw, adopt_adamw, muon, dion, and schedule_free_adamw. * feat: introduce Pydantic schema validation for dataset, attention, and training configurations. * feat: add e2e tests for custom optimizers including optimi_adamw, adopt_adamw, muon, dion, schedule_free_adamw, came_pytorch, and flash optimizers. * test: add e2e tests for custom optimizers including optimi_adamw, adopt_adamw, muon, dion, schedule_free_adamw, came_pytorch, and flash optimizers. * test: fix assertion in flash optimizers test to compare class names directly * fix: address PR review - reuse require_torch_2_7_0 decorator, remove fsdp_config.version check, extract shared FSDP version helper, remove unused imports and optim_args * chore: lint --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-20 08:24:44 -04:00
NanoCode012	5a5cf30b26	fix: add dequant bf16 repo (#3507 ) [skip ci]	2026-03-20 17:11:46 +07:00
Avaya Aggarwal	7ddfb2d8a0	cleanup: remove dead SDPA patches (#3488 ) [skip ci] Transformers 5.x routes attention through sdpa_attention.py and no longer calls the _prepare_4d_causal_attention_mask* or _expand_mask functions that these patches targeted. This makes the following patches dead code: - llama_patch_multipack.py (patched _prepare_4d_causal_attention_mask*) - llama_expand_mask.py (patched _expand_mask, never called) - Related utility functions in monkeypatch/utils.py Closes axolotl-ai-cloud/axolotl#3331	2026-03-20 17:10:41 +07:00
Owen Arliawan	c57acef2c7	Qwen3.5-MoE example config with lora_target_modules regex (#3515 ) [skip ci] * lora target modules with regex * updates * fsdp for non moe * update wording * chore: cleanup and lint * chore: cleanup docs from merge --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-20 16:52:46 +07:00
Lorenzo Baraldi	038ffe3f26	fix: solved double sequence partition from SequenceParallelContextManager and Accelerate's native CP (#3498 )	2026-03-20 16:27:24 +07:00
VED	c13cb7c853	feat: add nemotron config (#3506 ) * nemotron config exp * Update examples/nemotron/nemotron-mini-4b-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-03-20 16:23:42 +07:00
VED	b3823cc6b0	fix: gemma3 configs (#3500 ) [skip ci] * gemma fft , text fix * good lint	2026-03-20 16:14:06 +07:00
VED	113d275bd9	qwen docs + new config (#3499 ) [skip ci] * qwen docs + new config * docss lint * simplify comments * read me * lint comments * Update docs/multimodal.qmd * Update docs/multimodal.qmd * Update examples/qwen3.5/9b-fft-vision.yaml * chore: fix link and incorrect points --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-20 16:13:34 +07:00
VED	7920fe74ec	fix num_labels= 1 test fail (#3493 ) [skip ci] * trl_num_lables=1 * casual num_lables=1,rwd model * lint	2026-03-20 16:12:23 +07:00
Wing Lian	1fc86d5295	Scattermoe LoRA optimizations (#3513 ) * optimize moe + lora * more scattermoe optims * selective dequant * add correctness unit tests and benchmarks for scattermoe + lora * handle base+lora split kernel for older moe models * chore: lint * fix casting for H200 and B200 * register pressure estimation and pruning for h200/b200 * use soft limit for pruning * qkv patch for qwen3.5moe * support text_model for qwen3.5 moe * nesting of qwen3 * use udpated cce with zero3 support * Fix decomposed backward for QKV and O projections eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.	2026-03-19 23:07:42 -04:00
Wing Lian	bb483ad4c4	make the CI fail GitHub Actions on test failures (#3517 ) * make the CI fail GitHub Actions on test failures * use model bundle * install zstd for compressed model artifact	2026-03-19 08:29:24 -04:00
Wing Lian	163bd4dd5a	use custom triton kernels for entropy from logits and selective softmax (#3510 ) * use custom triton kernels for entropy from logits and selective softmax * PR comments fixes * fix out of bounds, include tests, include benchmarks * chore: lint	2026-03-19 02:02:43 -04:00
Wing Lian	f291ac029c	fix for flaky tests in lora ops kernels w autotune (#3511 ) [skip ci] * fix for flaky tests in lora ops kernels w autotune * attempt 2 to fix	2026-03-19 01:18:47 -04:00
Wing Lian	5ef3f28340	Support for Async GRPO (#3486 ) * async grpo support * implement data producer * use fast async * handle call to create data producer * fix liger kernel setup * fix replay buffer * chore: lint * make gpus go brrr * chore: lint * inplace div_, unwrap model for logits in bf16 * fuse selective softmax and empty cuda cache on each scoring step * remove waiting for synch time and fix race * make fp8 work and allow lora kernels w rl * grpo with lora vllm sync and fixes for sharded distributed * update docs * more patches so it works against trl main * address PR feedback for corerabbit	2026-03-17 11:42:47 -04:00
Aarush	999b3fec2e	fix: replace shell=True subprocess with argument list in modal CLI (#3487 ) * fix: replace shell=True subprocess with argument list in modal CLI Using shell=True with a formatted string containing docker_image (a user-controlled value) is a command injection risk (Bandit B602). Replace with an argument list, which passes args directly to the process without shell interpretation, removing the nosec annotation. * fix: add nosec annotation to suppress bandit B603/B607 warnings Removing shell=True (B602) surfaces B603 (subprocess without shell) and B607 (partial executable path for 'docker'). Use bare # nosec to suppress both, consistent with other nosec usages in the codebase.	2026-03-17 08:53:13 -04:00
Wing Lian	8f3fb517b3	consolidate behavioud of routing in scattermoe kernels (#3475 ) * consolidate behavioud of routing in scattermoe kernels * collect telemetry on best chosen autotuned kernel * properly collect data * Fix property name and get smem too * handle issues raised by coderabbit * add tests for parity before refactoring	2026-03-16 23:47:40 -04:00
Wing Lian	830e9f7eaf	automatically enable tf32 if supported (#3473 ) [skip ci] * automatically enable tf32 if supported * update fixtures * handle only when True * Address CR comments * address readability from pr comment * simplify	2026-03-16 23:47:00 -04:00
NanoCode012	d230cbbde3	chore(doc): update readme (#3503 ) [skip ci]	2026-03-17 09:43:24 +07:00
NanoCode012	a098df527b	feat: add Mistral Small 4 (#3502 ) * feat: add mistral small 4 * fix: update mistral common * fix: deepcopy when passing in tokenizer * feat: add doc on reasoning and thinking section * fix: don't use custom tokenizer and quantize experts * chore: update docs and configs * chore: update doc to follow official name * feat: update cce to include mistral4 * chore: move * fix: naming * fix: test mock breaking get_text_config check * fix: enable CCE and add expert block targetting to configs * chore: docs * fix: use act checkpointing * chore: doc * chore: docs * chore: docs	2026-03-17 09:39:05 +07:00
NanoCode012	7da5f94379	feat: add FA4 (#3481 ) * feat: add FA4 * chore: update docs * fix: recommend FA4 for those with compatible devices * fix: adjust import check and add head_dim check * chore: add limitation to doc * fix: log warning and quit if cannot import validator * chore: simplify * fix: add caveat with FA2 shadow dir	2026-03-16 00:13:18 -04:00
NanoCode012	4a5876df7a	fix: explicit set workflow permission and move secrets to necessary (#3484 ) [skip ci] * fix: explicit set workflow permission and move secrets to necessary steps only * fix: comment * fix: more permission restrict * chore: add read for pypi	2026-03-16 00:13:05 -04:00
Aarush	defee62d99	fix: fix CONTRIBUTING.md placeholders, bare except clauses, and add convert.py tests (#3485 ) [skip ci] * docs: fix codestyle placeholders in CONTRIBUTING.md Replace unresolved {codestyle} and {URLofCodestyle} template variables with Ruff, the project's actual linter/formatter as configured in .pre-commit-config.yaml. * fix: replace bare except clauses with specific exception types - quantization.py: use except ImportError for optional torchao imports (consistent with line 48 which already uses ImportError correctly) - cli/config.py: use except (RuntimeError, AssertionError) for CUDA device property query Prevents masking unrelated errors like KeyboardInterrupt or SystemExit. * test: add unit tests for convert.py JSON/JSONL utilities Cover FileReader, FileWriter, StdoutWriter, JsonParser, JsonlSerializer, and JsonToJsonlConverter with 8 test cases including roundtrip and edge case (empty list) scenarios. Previously this module had zero test coverage. * fix: address CodeRabbit review feedback - quantization.py: catch (ImportError, RuntimeError) for optional torchao imports; CUDA wheel/GPU mismatches raise RuntimeError, not ImportError - convert.py: remove unused output_file_path parameter from JsonToJsonlConverter.convert() — FileWriter already holds the output path from construction - tests/test_convert.py: update call site to match new signature	2026-03-16 00:12:40 -04:00
VED	f56efdb4ab	fix: high eval loss w/ sample packing (#3478 ) [skip ci] * check if eval_sp * radable condition	2026-03-15 22:11:23 -04:00
NanoCode012	d8a646c80d	chore: logging cleanup (#3482 ) [skip ci]	2026-03-15 22:10:57 -04:00
VED	a806704e94	moe quant patch for merge miss match (#3483 ) * moe quant patch for merge miss match * lint * revert test + fix moe patch * comment fixxes * e2e tests * mismatch fixx tested * mis match fix wwith vllm compatablity + test * comment lint * fix: missing os import, duplicate no op * chore: simplify comments --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-15 22:10:30 -04:00
Wing Lian	d8a05744d7	Reverts commits `79908b3c6`, `083c5a042`, `e1ff75624`, `ff77fa248`. (#3496 ) The non-root user approach had multiple issues with RunPod compatibility, sudo PATH handling, and tmux in exec sessions. Restoring root as the default user for now.	2026-03-13 11:54:09 -04:00
Wing Lian	ff77fa2488	preserve env for root -> ubuntu user (#3495 )	2026-03-13 10:19:34 -04:00
Wing Lian	e1ff756245	become the ubuntu user when root logs in (#3494 )	2026-03-13 09:06:54 -04:00
Wing Lian	083c5a0421	check ubuntu user and set uv python dir (#3492 )	2026-03-12 23:20:54 -04:00
Wing Lian	79908b3c6e	use ubuntu user instead of root for uv docker images (#3491 )	2026-03-12 20:41:13 -04:00
Wing Lian	819b157c7b	swap around what we're building for docker (#3490 ) * remove cloud configuration we don't base image for * but we do want it for uv	2026-03-11 21:45:13 -04:00
Wing Lian	fccc712dae	builds for py312-cu128-torch2.9.1 (#3489 )	2026-03-11 20:09:03 -04:00
NanoCode012	23ad40bdd5	fix: disable async load when loading quantized bnb	2026-03-11 13:18:27 +07:00
NanoCode012	cf4d550c88	fix: reduce permissions for preview docs CI (#3480 ) [skip ci]	2026-03-09 08:04:31 -04:00
Wing Lian	43b1c80aa6	load weights synchronously so they can be converted and not OOM: (#3477 )	2026-03-07 07:09:24 -05:00
Wing Lian	a36aaa70ce	add gpu tests for scattermoe (#3474 ) [skip ci]	2026-03-07 00:00:48 -05:00
Wing Lian	80f7088ad1	update setuptools so trl can be installed from main for nightlies (#3471 ) * update setuptools so trl can be installed from main for nightlies * run the nightly in the PR CI on change * use range request, don't use cu129 in CI since it's not supported with AO * run multigpu ci if CCE install script changes	2026-03-06 14:59:25 -05:00
Wing Lian	46b9f40f2a	bump dev version to 0.16.0.dev0 (#3472 ) [skip ci]	2026-03-06 14:59:00 -05:00
Wing Lian	8f19169eb0	tag for v0.15.0 release (#3470 ) Some checks failed ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.15.0	2026-03-06 12:55:11 -05:00
Wing Lian	876941ffd0	install flash-linear-attention (#3466 ) * install flash-linear-attention * handle prequant weights for fsdp2 and ensure loss is not zero * fix type for cu_seqlen, uninstall causal_conv1d * chore: lint * uv pip uninstall doesn't need confirmation	2026-03-06 12:40:57 -05:00
NanoCode012	d65e1b960c	fix: add guard for _initialize_missing_keys patch (#3469 ) [skip ci]	2026-03-06 11:45:03 -05:00
NanoCode012	0a23ae08f7	fix: position_ids casted to int64 for qwen35 patch (#3468 ) [skip ci] * fix: position_ids casted to int64 for qwen35 patch * fix: to use view instead of reshape to ensure noncontiguous error explicitly * chore: lint	2026-03-06 11:44:00 -05:00
Wing Lian	fc2d63ee5f	use new tf32 APIs for torch 2.9+ (#3467 ) [skip ci] * use new tf32 APIs for torch 2.9+ * also upgrade cce for tf32 fixes and lint	2026-03-06 11:40:32 -05:00
VED	c119382337	add: qwen 3.5 (#3442 ) * add: qwen 3.5 * test for qwen , patch * lint * qwen3 fix on main * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * moe config * config moe * configs and chore * Update examples/qwen3.5/122b-a10b-moe-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/qwen3.5/35b-a3b-moe-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * chore for qwen + vlm patch * chore lint * qwen lint * 3_5_moe * Update examples/qwen3.5/README.md --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-03-06 09:31:00 -05:00
NanoCode012	6c8c73e5a4	fix(validation): add validation for lora target linear with quantize experts (#3461 ) * fix: add validation for lora target linear with quantize experts * chore: fix lint * chore: comment * fix: missing link on readme	2026-03-06 09:19:05 -05:00
Wing Lian	a260d330ed	add info about linting that was removed at some point (#3458 ) [skip ci]	2026-03-06 09:18:38 -05:00
Gilles Turpin	da17c7c0d9	fix: use dp_world_size instead of world_size for batch_size with tensor parallelism (#3462 ) [skip ci]	2026-03-06 09:18:13 -05:00
Wing Lian	cada93cee5	upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459 ) * upgrade transformers==5.3.0 trl==0.29.0 kernels * use latest deepspeed fixes * use corect image for cleanup * fix test outputs for tokenizer fixes upstream * fix import: * keep trl at 0.28.0 * handle updated API * use latest trl since 0.28.0 doesn't work with latest transformers * use trl experimental for pad to length * monkeypatch trl with ORPOTrainer so liger doesn't croak * upgrade accelerate * more fixes * move patch for orpotrainer * load the imports later * remove use_logits_to_keep * fix loss_type arg as a list * fetch hf cache from s3 * just manually download the missing model for now * lint for pre-commit update * a few more missing models on disk * fix: loss_type internally now list * fix: remove deprecated code and raise deprecate * fix: remove unneeded blocklist * fix: remove reliance on transformers api to find package available * chore: refactor shim for less sideeffect * fix: silent trl experimental warning --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-06 09:11:20 -05:00

1 2 3 4 5 ...

2650 Commits