axolotl

Author	SHA1	Message	Date
Wing Lian	5191e4eb53	More minor RL fixes (#3551 ) * fix: handle get_open_port import across TRL versions TRL 0.29+ removed get_open_port from exports; fall back to importing directly from vllm.utils or vllm.utils.network_utils. * support DP with vllm and make generation_batch_size confifurable	2026-03-25 18:17:49 -04:00
Wing Lian	74b959e035	dispatch scored rollouts to plugins, extend path for external plugins, better handle errors with vllm /reset_prefix_cache (#3549 ) * dispatch scored rollouts to plugins, extend path for external plugins, better handle errors with vllm /reset_prefix_cache * address PR comments, lint	2026-03-25 11:19:15 -04:00
VED	b55706b9f6	feat:merge-lora iterate through bins without loading (#3095 ) * merge_method added * merge_efficient core implement * Update src/axolotl/cli/merge_lora.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * Update src/axolotl/utils/lora_merge_efficient.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * standard to leagcy + rstrip + try/except for do_merge_lora_efficient(cfg=cfg) * fix: 'dict' object has no attribute 'lora_alpha' * into -> debug * lint * lint2 * moved everythign to cpu + peformance improvments * lint * Update src/axolotl/cli/merge_lora.py Co-authored-by: Dan Saunders <danjsaund@gmail.com> * Update src/axolotl/cli/merge_lora.py Co-authored-by: Dan Saunders <danjsaund@gmail.com> * string handeling + try except remove * merge_method -> merge_lora_methods * remove duplicate cal + safetensor + move to lora_merge.py * lint * handle quant-dequant, handle experts * fix parameter merging and prefer peft's native merge logic per module --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2026-03-25 08:41:32 -04:00
Avaya Aggarwal	ff0f67c730	feat: add custom routing support for ernie4_5_moe, and hunyuan_v1_moe (#3526 ) * feat: add Ernie 4.5 and subsequently custom routing support * Update routing.py * chore: lint * fix minor nits * removed deepseek v2 * remove unneeded change --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-03-25 08:40:31 -04:00
Matthew Hambrecht	678ebb1bb2	Fix Ray train crashing after succeeding (#3542 ) [skip ci]	2026-03-25 07:38:28 -04:00
Wing Lian	c2bd75aff6	Nemo gym integration (#3516 ) [skip ci] * nemo gym integration with grpo wip * mostly working * cleanup * simplify * update docs * nemo gym support wip * cleanup * chore: lint * address PR review and add more tests * chore: lint * post merge lora fixes for CI (#3536) [skip ci] * post merge lora fixes for CI * handle lora kernel auto-enable for moe without grouped_mm * prefer not to import torch in schema validation * address pr comments, add timeout, add tests * roundup_power2_divisions not needed with newer pytorch versions (#3540) * roundup_power2_divisions not needed with newer pytorch versions * remove typo * update qwen3.5 moe 35b-a3b yaml for 5090 * more bug fixes * fix tests to match updated trainer * don't use fa2 for hooks test * reset plugins on the instance * retry download * fix references to renamed axolotl_cfg property on trainer * Fix ref to trainer cfg * fix: robust handling of race condition on patching check (#3543) [skip ci] * EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models (#3527) [skip ci] * EBFT wip * fixes * more fixeS * add missing strided module * ebft fixes for multi-turn * make ebft work with async * add example for ebft w qwen3.5 * fix for split thinking and update yaml for lora over linear attention only * enforce_eager for vllm arg in schema * fix sync weights * fix multi-gpu * handle updated sig for mm * ddp fixes * improve multi-gpu handling, don't calculate logits, adaptive completion length * chore: lint * chore: lint * support completion_mean * Address corereview feedback * clamp min IS ratio * Address PR code review * more fixes identified * address code review * Fix property from rebase conflict * fix for ebft sync and update docs * make trainer loss patch check a solo test --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 07:38:06 -04:00
NanoCode012	2fb72798e0	Revert "feat: move to uv first" (#3544 ) This reverts commit `1f1ebb8237`.	2026-03-25 16:12:36 +07:00
NanoCode012	1f1ebb8237	feat: move to uv first	2026-03-25 16:06:37 +07:00
Wing Lian	c50c4acbf4	EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models (#3527 ) [skip ci] * EBFT wip * fixes * more fixeS * add missing strided module * ebft fixes for multi-turn * make ebft work with async * add example for ebft w qwen3.5 * fix for split thinking and update yaml for lora over linear attention only * enforce_eager for vllm arg in schema * fix sync weights * fix multi-gpu * handle updated sig for mm * ddp fixes * improve multi-gpu handling, don't calculate logits, adaptive completion length * chore: lint * chore: lint * support completion_mean * Address corereview feedback * clamp min IS ratio * Address PR code review * more fixes identified * address code review * Fix property from rebase conflict	2026-03-24 18:43:46 -04:00
Wing Lian	e9883c91d4	fix: robust handling of race condition on patching check (#3543 ) [skip ci]	2026-03-24 16:43:43 -04:00
Wing Lian	e412370877	roundup_power2_divisions not needed with newer pytorch versions (#3540 ) * roundup_power2_divisions not needed with newer pytorch versions * remove typo * update qwen3.5 moe 35b-a3b yaml for 5090 * more bug fixes * fix tests to match updated trainer * don't use fa2 for hooks test * reset plugins on the instance * retry download * fix references to renamed axolotl_cfg property on trainer * Fix ref to trainer cfg	2026-03-24 15:40:05 -04:00
Wing Lian	86be9f329e	post merge lora fixes for CI (#3536 ) [skip ci] * post merge lora fixes for CI * handle lora kernel auto-enable for moe without grouped_mm * prefer not to import torch in schema validation	2026-03-23 02:26:10 -04:00
Wing Lian	b3289fd190	feat: LoRA kernel support for bias, dropout, dora, embeddings (#3528 ) [skip ci] * feat: LoRA kernel support for bias, dropout, dora, embeddings * chore: lint * chore: lint * address PR feedback, add regression tests, add fsdp2 tests for lora kernels * update tests for new sigs * update tests now that bias and dropout are supported	2026-03-22 13:53:19 -04:00
Wing Lian	a67392c427	liger support for qwen 3.5 and fused rmsnorm+gated (#3531 ) [skip ci] * liger support for qwen 3.5 and fused rmsnorm+gated * support for qwen 3.5 moe * fix version ref * fixups for PR code review	2026-03-22 13:19:21 -04:00
Wing Lian	5b2e3f00ce	fix: handle connection errors when checking user whoami (#3529 )	2026-03-22 09:11:17 -04:00
Wing Lian	fc3b3d1d4e	synthetic datasets for benchmarking and testing (#3518 ) [skip ci] * synthetic datasets for benchmarking and testing * fix synthetic dataset parse from config and add tests * use type=_synthetic	2026-03-21 22:47:26 -04:00
Wing Lian	c9df6efdc2	support offloading layers to CPU (#3512 ) [skip ci] * support offloading layers to CPU * chore: lint * revert change * update docs	2026-03-21 22:47:02 -04:00
Wing Lian	0ee98a0309	fix token state json and mistral tokenizer issue (#3522 ) [skip ci] * fix token state json and mistral tokenizer issue * centralize constants * forgot to commit constants file * Fix weakref in pickling relora state dict * make curl a bit quieter so it doesn't log 2K lines * fix path traversal for olmoe test * more test fixes that weren't flagged previously * chore: lint * skip tests that fail b/c of OutOfResources * scattermoe as slow tests * update fbgemm-genai for torch 2.10	2026-03-21 22:46:10 -04:00
Wing Lian	2c05847a5f	reduce autotune search space (#3525 ) [skip ci] * reduce autotune search space * consistent docstrings	2026-03-21 18:30:15 -04:00
Wing Lian	b0294b3427	handle qwen3.5 moe loading (#3523 ) [skip ci]	2026-03-20 09:25:16 -04:00
Avaya Aggarwal	1bcfc08c90	feat: add support and end-to-end tests for multiple custom optimizers… (#3457 ) [skip ci] * feat: add support and end-to-end tests for multiple custom optimizers including Optimi AdamW, ADOPT AdamW, Muon, Dion, Schedule-Free AdamW, CAME PyTorch, and Flash AdamW. * feat: Add standalone flashoptim integration test and E2E tests for various custom optimizers including FlashAdamW, FlashAdam, FlashSGD, FlashSGDW, FlashLion, optimi_adamw, adopt_adamw, muon, dion, and schedule_free_adamw. * feat: introduce Pydantic schema validation for dataset, attention, and training configurations. * feat: add e2e tests for custom optimizers including optimi_adamw, adopt_adamw, muon, dion, schedule_free_adamw, came_pytorch, and flash optimizers. * test: add e2e tests for custom optimizers including optimi_adamw, adopt_adamw, muon, dion, schedule_free_adamw, came_pytorch, and flash optimizers. * test: fix assertion in flash optimizers test to compare class names directly * fix: address PR review - reuse require_torch_2_7_0 decorator, remove fsdp_config.version check, extract shared FSDP version helper, remove unused imports and optim_args * chore: lint --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-20 08:24:44 -04:00
Avaya Aggarwal	7ddfb2d8a0	cleanup: remove dead SDPA patches (#3488 ) [skip ci] Transformers 5.x routes attention through sdpa_attention.py and no longer calls the _prepare_4d_causal_attention_mask* or _expand_mask functions that these patches targeted. This makes the following patches dead code: - llama_patch_multipack.py (patched _prepare_4d_causal_attention_mask*) - llama_expand_mask.py (patched _expand_mask, never called) - Related utility functions in monkeypatch/utils.py Closes axolotl-ai-cloud/axolotl#3331	2026-03-20 17:10:41 +07:00
Lorenzo Baraldi	038ffe3f26	fix: solved double sequence partition from SequenceParallelContextManager and Accelerate's native CP (#3498 )	2026-03-20 16:27:24 +07:00
VED	c13cb7c853	feat: add nemotron config (#3506 ) * nemotron config exp * Update examples/nemotron/nemotron-mini-4b-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-03-20 16:23:42 +07:00
VED	7920fe74ec	fix num_labels= 1 test fail (#3493 ) [skip ci] * trl_num_lables=1 * casual num_lables=1,rwd model * lint	2026-03-20 16:12:23 +07:00
Wing Lian	1fc86d5295	Scattermoe LoRA optimizations (#3513 ) * optimize moe + lora * more scattermoe optims * selective dequant * add correctness unit tests and benchmarks for scattermoe + lora * handle base+lora split kernel for older moe models * chore: lint * fix casting for H200 and B200 * register pressure estimation and pruning for h200/b200 * use soft limit for pruning * qkv patch for qwen3.5moe * support text_model for qwen3.5 moe * nesting of qwen3 * use udpated cce with zero3 support * Fix decomposed backward for QKV and O projections eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.	2026-03-19 23:07:42 -04:00
Wing Lian	163bd4dd5a	use custom triton kernels for entropy from logits and selective softmax (#3510 ) * use custom triton kernels for entropy from logits and selective softmax * PR comments fixes * fix out of bounds, include tests, include benchmarks * chore: lint	2026-03-19 02:02:43 -04:00
Wing Lian	f291ac029c	fix for flaky tests in lora ops kernels w autotune (#3511 ) [skip ci] * fix for flaky tests in lora ops kernels w autotune * attempt 2 to fix	2026-03-19 01:18:47 -04:00
Wing Lian	5ef3f28340	Support for Async GRPO (#3486 ) * async grpo support * implement data producer * use fast async * handle call to create data producer * fix liger kernel setup * fix replay buffer * chore: lint * make gpus go brrr * chore: lint * inplace div_, unwrap model for logits in bf16 * fuse selective softmax and empty cuda cache on each scoring step * remove waiting for synch time and fix race * make fp8 work and allow lora kernels w rl * grpo with lora vllm sync and fixes for sharded distributed * update docs * more patches so it works against trl main * address PR feedback for corerabbit	2026-03-17 11:42:47 -04:00
Aarush	999b3fec2e	fix: replace shell=True subprocess with argument list in modal CLI (#3487 ) * fix: replace shell=True subprocess with argument list in modal CLI Using shell=True with a formatted string containing docker_image (a user-controlled value) is a command injection risk (Bandit B602). Replace with an argument list, which passes args directly to the process without shell interpretation, removing the nosec annotation. * fix: add nosec annotation to suppress bandit B603/B607 warnings Removing shell=True (B602) surfaces B603 (subprocess without shell) and B607 (partial executable path for 'docker'). Use bare # nosec to suppress both, consistent with other nosec usages in the codebase.	2026-03-17 08:53:13 -04:00
Wing Lian	8f3fb517b3	consolidate behavioud of routing in scattermoe kernels (#3475 ) * consolidate behavioud of routing in scattermoe kernels * collect telemetry on best chosen autotuned kernel * properly collect data * Fix property name and get smem too * handle issues raised by coderabbit * add tests for parity before refactoring	2026-03-16 23:47:40 -04:00
Wing Lian	830e9f7eaf	automatically enable tf32 if supported (#3473 ) [skip ci] * automatically enable tf32 if supported * update fixtures * handle only when True * Address CR comments * address readability from pr comment * simplify	2026-03-16 23:47:00 -04:00
NanoCode012	a098df527b	feat: add Mistral Small 4 (#3502 ) * feat: add mistral small 4 * fix: update mistral common * fix: deepcopy when passing in tokenizer * feat: add doc on reasoning and thinking section * fix: don't use custom tokenizer and quantize experts * chore: update docs and configs * chore: update doc to follow official name * feat: update cce to include mistral4 * chore: move * fix: naming * fix: test mock breaking get_text_config check * fix: enable CCE and add expert block targetting to configs * chore: docs * fix: use act checkpointing * chore: doc * chore: docs * chore: docs	2026-03-17 09:39:05 +07:00
NanoCode012	7da5f94379	feat: add FA4 (#3481 ) * feat: add FA4 * chore: update docs * fix: recommend FA4 for those with compatible devices * fix: adjust import check and add head_dim check * chore: add limitation to doc * fix: log warning and quit if cannot import validator * chore: simplify * fix: add caveat with FA2 shadow dir	2026-03-16 00:13:18 -04:00
Aarush	defee62d99	fix: fix CONTRIBUTING.md placeholders, bare except clauses, and add convert.py tests (#3485 ) [skip ci] * docs: fix codestyle placeholders in CONTRIBUTING.md Replace unresolved {codestyle} and {URLofCodestyle} template variables with Ruff, the project's actual linter/formatter as configured in .pre-commit-config.yaml. * fix: replace bare except clauses with specific exception types - quantization.py: use except ImportError for optional torchao imports (consistent with line 48 which already uses ImportError correctly) - cli/config.py: use except (RuntimeError, AssertionError) for CUDA device property query Prevents masking unrelated errors like KeyboardInterrupt or SystemExit. * test: add unit tests for convert.py JSON/JSONL utilities Cover FileReader, FileWriter, StdoutWriter, JsonParser, JsonlSerializer, and JsonToJsonlConverter with 8 test cases including roundtrip and edge case (empty list) scenarios. Previously this module had zero test coverage. * fix: address CodeRabbit review feedback - quantization.py: catch (ImportError, RuntimeError) for optional torchao imports; CUDA wheel/GPU mismatches raise RuntimeError, not ImportError - convert.py: remove unused output_file_path parameter from JsonToJsonlConverter.convert() — FileWriter already holds the output path from construction - tests/test_convert.py: update call site to match new signature	2026-03-16 00:12:40 -04:00
VED	f56efdb4ab	fix: high eval loss w/ sample packing (#3478 ) [skip ci] * check if eval_sp * radable condition	2026-03-15 22:11:23 -04:00
NanoCode012	d8a646c80d	chore: logging cleanup (#3482 ) [skip ci]	2026-03-15 22:10:57 -04:00
VED	a806704e94	moe quant patch for merge miss match (#3483 ) * moe quant patch for merge miss match * lint * revert test + fix moe patch * comment fixxes * e2e tests * mismatch fixx tested * mis match fix wwith vllm compatablity + test * comment lint * fix: missing os import, duplicate no op * chore: simplify comments --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-15 22:10:30 -04:00
NanoCode012	23ad40bdd5	fix: disable async load when loading quantized bnb	2026-03-11 13:18:27 +07:00
Wing Lian	43b1c80aa6	load weights synchronously so they can be converted and not OOM: (#3477 )	2026-03-07 07:09:24 -05:00
Wing Lian	876941ffd0	install flash-linear-attention (#3466 ) * install flash-linear-attention * handle prequant weights for fsdp2 and ensure loss is not zero * fix type for cu_seqlen, uninstall causal_conv1d * chore: lint * uv pip uninstall doesn't need confirmation	2026-03-06 12:40:57 -05:00
NanoCode012	d65e1b960c	fix: add guard for _initialize_missing_keys patch (#3469 ) [skip ci]	2026-03-06 11:45:03 -05:00
NanoCode012	0a23ae08f7	fix: position_ids casted to int64 for qwen35 patch (#3468 ) [skip ci] * fix: position_ids casted to int64 for qwen35 patch * fix: to use view instead of reshape to ensure noncontiguous error explicitly * chore: lint	2026-03-06 11:44:00 -05:00
Wing Lian	fc2d63ee5f	use new tf32 APIs for torch 2.9+ (#3467 ) [skip ci] * use new tf32 APIs for torch 2.9+ * also upgrade cce for tf32 fixes and lint	2026-03-06 11:40:32 -05:00
VED	c119382337	add: qwen 3.5 (#3442 ) * add: qwen 3.5 * test for qwen , patch * lint * qwen3 fix on main * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * moe config * config moe * configs and chore * Update examples/qwen3.5/122b-a10b-moe-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/qwen3.5/35b-a3b-moe-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * chore for qwen + vlm patch * chore lint * qwen lint * 3_5_moe * Update examples/qwen3.5/README.md --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-03-06 09:31:00 -05:00
NanoCode012	6c8c73e5a4	fix(validation): add validation for lora target linear with quantize experts (#3461 ) * fix: add validation for lora target linear with quantize experts * chore: fix lint * chore: comment * fix: missing link on readme	2026-03-06 09:19:05 -05:00
Gilles Turpin	da17c7c0d9	fix: use dp_world_size instead of world_size for batch_size with tensor parallelism (#3462 ) [skip ci]	2026-03-06 09:18:13 -05:00
Wing Lian	cada93cee5	upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459 ) * upgrade transformers==5.3.0 trl==0.29.0 kernels * use latest deepspeed fixes * use corect image for cleanup * fix test outputs for tokenizer fixes upstream * fix import: * keep trl at 0.28.0 * handle updated API * use latest trl since 0.28.0 doesn't work with latest transformers * use trl experimental for pad to length * monkeypatch trl with ORPOTrainer so liger doesn't croak * upgrade accelerate * more fixes * move patch for orpotrainer * load the imports later * remove use_logits_to_keep * fix loss_type arg as a list * fetch hf cache from s3 * just manually download the missing model for now * lint for pre-commit update * a few more missing models on disk * fix: loss_type internally now list * fix: remove deprecated code and raise deprecate * fix: remove unneeded blocklist * fix: remove reliance on transformers api to find package available * chore: refactor shim for less sideeffect * fix: silent trl experimental warning --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-06 09:11:20 -05:00
Wing Lian	56162f71db	monkeypatch fix for fsdp with cpu ram efficient loading (#3464 ) [skip ci]	2026-03-06 09:10:58 -05:00
NanoCode012	6a8baf8fa7	feat: add sonicmoe (#3411 ) * feat: add sonicmoe * feat: add torch compile for routing * feat: add routing smoke test * feat: add qwen3_5_moe, qwen3_vl_moe, qwen3_omni_moe * fix: disable mlp kernel for sonicmoe too * feat: update to sonicmoe release * chore: update import following new sonicmoe changes * feat: update handling for blackwell * feat: add sonicmoe e2e test * fix: installation for updated sonicmoe * fix: git commit * fix: ignore py req and fix metadata * fix: increase min hidden size to match sonicmoe kernel min * fix: attempt properly interleave and handle unpatch mid-test * chore: refactor teardown better * chore: refactor to re-use rearrange * fix: add idempotency guard * fix: address comments on CI memory and interleave * fix: tests grad, param doublewrapped	2026-03-05 13:43:31 -05:00

1 2 3 4 5 ...

1491 Commits