Commit Graph

1491 Commits

Author SHA1 Message Date
Wing Lian
5191e4eb53 More minor RL fixes (#3551)
* fix: handle get_open_port import across TRL versions

TRL 0.29+ removed get_open_port from exports; fall back to importing
directly from vllm.utils or vllm.utils.network_utils.

* support DP with vllm and make generation_batch_size confifurable
2026-03-25 18:17:49 -04:00
Wing Lian
74b959e035 dispatch scored rollouts to plugins, extend path for external plugins, better handle errors with vllm /reset_prefix_cache (#3549)
* dispatch scored rollouts to plugins, extend path for external plugins, better handle errors with vllm /reset_prefix_cache

* address PR comments, lint
2026-03-25 11:19:15 -04:00
VED
b55706b9f6 feat:merge-lora iterate through bins without loading (#3095)
* merge_method added

* merge_efficient core implement

* Update src/axolotl/cli/merge_lora.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* Update src/axolotl/utils/lora_merge_efficient.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* standard to leagcy + rstrip + try/except for do_merge_lora_efficient(cfg=cfg)

* fix: 'dict' object has no attribute 'lora_alpha'

* into -> debug

* lint

* lint2

* moved everythign to cpu + peformance improvments

* lint

* Update src/axolotl/cli/merge_lora.py

Co-authored-by: Dan Saunders <danjsaund@gmail.com>

* Update src/axolotl/cli/merge_lora.py

Co-authored-by: Dan Saunders <danjsaund@gmail.com>

* string handeling +  try except remove

* merge_method -> merge_lora_methods

* remove duplicate cal + safetensor + move to lora_merge.py

* lint

* handle quant-dequant, handle experts

* fix parameter merging and prefer peft's native merge logic per module

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Dan Saunders <danjsaund@gmail.com>
2026-03-25 08:41:32 -04:00
Avaya Aggarwal
ff0f67c730 feat: add custom routing support for ernie4_5_moe, and hunyuan_v1_moe (#3526)
* feat: add Ernie 4.5 and subsequently custom routing support

* Update routing.py

* chore: lint

* fix minor nits

* removed deepseek v2

* remove unneeded change

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2026-03-25 08:40:31 -04:00
Matthew Hambrecht
678ebb1bb2 Fix Ray train crashing after succeeding (#3542) [skip ci] 2026-03-25 07:38:28 -04:00
Wing Lian
c2bd75aff6 Nemo gym integration (#3516) [skip ci]
* nemo gym integration with grpo wip

* mostly working

* cleanup

* simplify

* update docs

* nemo gym support wip

* cleanup

* chore: lint

* address PR review and add more tests

* chore: lint

* post merge lora fixes for CI (#3536) [skip ci]

* post merge lora fixes for CI

* handle lora kernel auto-enable for moe without grouped_mm

* prefer not to import torch in schema validation

* address pr comments, add timeout, add tests

* roundup_power2_divisions not needed with newer pytorch versions (#3540)

* roundup_power2_divisions not needed with newer pytorch versions

* remove typo

* update qwen3.5 moe 35b-a3b yaml for 5090

* more bug fixes

* fix tests to match updated trainer

* don't use fa2 for hooks test

* reset plugins on the instance

* retry download

* fix references to renamed axolotl_cfg property on trainer

* Fix ref to trainer cfg

* fix: robust handling of race condition on patching check (#3543) [skip ci]

* EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models (#3527) [skip ci]

* EBFT wip

* fixes

* more fixeS

* add missing strided module

* ebft fixes for multi-turn

* make ebft work with async

* add example for ebft w qwen3.5

* fix for split thinking and update yaml for lora over linear attention only

* enforce_eager for vllm arg in schema

* fix sync weights

* fix multi-gpu

* handle updated sig for mm

* ddp fixes

* improve multi-gpu handling, don't calculate logits, adaptive completion length

* chore: lint

* chore: lint

* support completion_mean

* Address corereview feedback

* clamp min IS ratio

* Address PR code review

* more fixes identified

* address code review

* Fix property from rebase conflict

* fix for ebft sync and update docs

* make trainer loss patch check a solo test

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 07:38:06 -04:00
NanoCode012
2fb72798e0 Revert "feat: move to uv first" (#3544)
This reverts commit 1f1ebb8237.
2026-03-25 16:12:36 +07:00
NanoCode012
1f1ebb8237 feat: move to uv first 2026-03-25 16:06:37 +07:00
Wing Lian
c50c4acbf4 EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models (#3527) [skip ci]
* EBFT wip

* fixes

* more fixeS

* add missing strided module

* ebft fixes for multi-turn

* make ebft work with async

* add example for ebft w qwen3.5

* fix for split thinking and update yaml for lora over linear attention only

* enforce_eager for vllm arg in schema

* fix sync weights

* fix multi-gpu

* handle updated sig for mm

* ddp fixes

* improve multi-gpu handling, don't calculate logits, adaptive completion length

* chore: lint

* chore: lint

* support completion_mean

* Address corereview feedback

* clamp min IS ratio

* Address PR code review

* more fixes identified

* address code review

* Fix property from rebase conflict
2026-03-24 18:43:46 -04:00
Wing Lian
e9883c91d4 fix: robust handling of race condition on patching check (#3543) [skip ci] 2026-03-24 16:43:43 -04:00
Wing Lian
e412370877 roundup_power2_divisions not needed with newer pytorch versions (#3540)
* roundup_power2_divisions not needed with newer pytorch versions

* remove typo

* update qwen3.5 moe 35b-a3b yaml for 5090

* more bug fixes

* fix tests to match updated trainer

* don't use fa2 for hooks test

* reset plugins on the instance

* retry download

* fix references to renamed axolotl_cfg property on trainer

* Fix ref to trainer cfg
2026-03-24 15:40:05 -04:00
Wing Lian
86be9f329e post merge lora fixes for CI (#3536) [skip ci]
* post merge lora fixes for CI

* handle lora kernel auto-enable for moe without grouped_mm

* prefer not to import torch in schema validation
2026-03-23 02:26:10 -04:00
Wing Lian
b3289fd190 feat: LoRA kernel support for bias, dropout, dora, embeddings (#3528) [skip ci]
* feat: LoRA kernel support for bias, dropout, dora, embeddings

* chore: lint

* chore: lint

* address PR feedback, add regression tests, add fsdp2 tests for lora kernels

* update tests for new sigs

* update tests now that bias and dropout are supported
2026-03-22 13:53:19 -04:00
Wing Lian
a67392c427 liger support for qwen 3.5 and fused rmsnorm+gated (#3531) [skip ci]
* liger support for qwen 3.5 and fused rmsnorm+gated

* support for qwen 3.5 moe

* fix version ref

* fixups for PR code review
2026-03-22 13:19:21 -04:00
Wing Lian
5b2e3f00ce fix: handle connection errors when checking user whoami (#3529) 2026-03-22 09:11:17 -04:00
Wing Lian
fc3b3d1d4e synthetic datasets for benchmarking and testing (#3518) [skip ci]
* synthetic datasets for benchmarking and testing

* fix synthetic dataset parse from config and add tests

* use type=_synthetic
2026-03-21 22:47:26 -04:00
Wing Lian
c9df6efdc2 support offloading layers to CPU (#3512) [skip ci]
* support offloading layers to CPU

* chore: lint

* revert change

* update docs
2026-03-21 22:47:02 -04:00
Wing Lian
0ee98a0309 fix token state json and mistral tokenizer issue (#3522) [skip ci]
* fix token state json and mistral tokenizer issue

* centralize constants

* forgot to commit constants file

* Fix weakref in pickling relora state dict

* make curl a bit quieter so it doesn't log 2K lines

* fix path traversal for olmoe test

* more test fixes that weren't flagged previously

* chore: lint

* skip tests that fail b/c of OutOfResources

* scattermoe as slow tests

* update fbgemm-genai for torch 2.10
2026-03-21 22:46:10 -04:00
Wing Lian
2c05847a5f reduce autotune search space (#3525) [skip ci]
* reduce autotune search space

* consistent docstrings
2026-03-21 18:30:15 -04:00
Wing Lian
b0294b3427 handle qwen3.5 moe loading (#3523) [skip ci] 2026-03-20 09:25:16 -04:00
Avaya Aggarwal
1bcfc08c90 feat: add support and end-to-end tests for multiple custom optimizers… (#3457) [skip ci]
* feat: add support and end-to-end tests for multiple custom optimizers including Optimi AdamW, ADOPT AdamW, Muon, Dion, Schedule-Free AdamW, CAME PyTorch, and Flash AdamW.

* feat: Add standalone flashoptim integration test and E2E tests for various custom optimizers including FlashAdamW, FlashAdam, FlashSGD, FlashSGDW, FlashLion, optimi_adamw, adopt_adamw, muon, dion, and schedule_free_adamw.

* feat: introduce Pydantic schema validation for dataset, attention, and training configurations.

* feat: add e2e tests for custom optimizers including optimi_adamw, adopt_adamw, muon, dion, schedule_free_adamw, came_pytorch, and flash optimizers.

* test: add e2e tests for custom optimizers including optimi_adamw, adopt_adamw, muon, dion, schedule_free_adamw, came_pytorch, and flash optimizers.

* test: fix assertion in flash optimizers test to compare class names directly

* fix: address PR review - reuse require_torch_2_7_0 decorator, remove fsdp_config.version check, extract shared FSDP version helper, remove unused imports and optim_args

* chore: lint

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2026-03-20 08:24:44 -04:00
Avaya Aggarwal
7ddfb2d8a0 cleanup: remove dead SDPA patches (#3488) [skip ci]
Transformers 5.x routes attention through sdpa_attention.py and no longer
calls the _prepare_4d_causal_attention_mask* or _expand_mask functions that
these patches targeted. This makes the following patches dead code:

- llama_patch_multipack.py (patched _prepare_4d_causal_attention_mask*)
- llama_expand_mask.py (patched _expand_mask, never called)
- Related utility functions in monkeypatch/utils.py

Closes axolotl-ai-cloud/axolotl#3331
2026-03-20 17:10:41 +07:00
Lorenzo Baraldi
038ffe3f26 fix: solved double sequence partition from SequenceParallelContextManager and Accelerate's native CP (#3498) 2026-03-20 16:27:24 +07:00
VED
c13cb7c853 feat: add nemotron config (#3506)
* nemotron config exp

* Update examples/nemotron/nemotron-mini-4b-qlora.yaml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2026-03-20 16:23:42 +07:00
VED
7920fe74ec fix num_labels= 1 test fail (#3493) [skip ci]
* trl_num_lables=1

* casual num_lables=1,rwd model

* lint
2026-03-20 16:12:23 +07:00
Wing Lian
1fc86d5295 Scattermoe LoRA optimizations (#3513)
* optimize moe + lora

* more scattermoe optims

* selective dequant

* add correctness unit tests and benchmarks for scattermoe + lora

* handle base+lora split kernel for older moe models

* chore: lint

* fix casting for H200 and B200

* register pressure estimation and pruning for h200/b200

* use soft limit for pruning

* qkv patch for qwen3.5moe

* support text_model for qwen3.5 moe

* nesting of qwen3

* use udpated cce with zero3 support

* Fix decomposed backward for QKV and O projections

eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.
2026-03-19 23:07:42 -04:00
Wing Lian
163bd4dd5a use custom triton kernels for entropy from logits and selective softmax (#3510)
* use custom triton kernels for entropy from logits and selective softmax

* PR comments fixes

* fix out of bounds, include tests, include benchmarks

* chore: lint
2026-03-19 02:02:43 -04:00
Wing Lian
f291ac029c fix for flaky tests in lora ops kernels w autotune (#3511) [skip ci]
* fix for flaky tests in lora ops kernels w autotune

* attempt 2 to fix
2026-03-19 01:18:47 -04:00
Wing Lian
5ef3f28340 Support for Async GRPO (#3486)
* async grpo support

* implement data producer

* use fast async

* handle call to create data producer

* fix liger kernel setup

* fix replay buffer

* chore: lint

* make gpus go brrr

* chore: lint

* inplace div_, unwrap model for logits in bf16

* fuse selective softmax and empty cuda cache on each scoring step

* remove waiting for synch time and fix race

* make fp8 work and allow lora kernels w rl

* grpo with lora vllm sync and fixes for sharded distributed

* update docs

* more patches so it works against trl main

* address PR feedback for corerabbit
2026-03-17 11:42:47 -04:00
Aarush
999b3fec2e fix: replace shell=True subprocess with argument list in modal CLI (#3487)
* fix: replace shell=True subprocess with argument list in modal CLI

Using shell=True with a formatted string containing docker_image
(a user-controlled value) is a command injection risk (Bandit B602).
Replace with an argument list, which passes args directly to the
process without shell interpretation, removing the nosec annotation.

* fix: add nosec annotation to suppress bandit B603/B607 warnings

Removing shell=True (B602) surfaces B603 (subprocess without shell)
and B607 (partial executable path for 'docker'). Use bare # nosec
to suppress both, consistent with other nosec usages in the codebase.
2026-03-17 08:53:13 -04:00
Wing Lian
8f3fb517b3 consolidate behavioud of routing in scattermoe kernels (#3475)
* consolidate behavioud of routing in scattermoe kernels

* collect telemetry on best chosen autotuned kernel

* properly collect data

* Fix property name and get smem too

* handle issues raised by coderabbit

* add tests for parity before refactoring
2026-03-16 23:47:40 -04:00
Wing Lian
830e9f7eaf automatically enable tf32 if supported (#3473) [skip ci]
* automatically enable tf32 if supported

* update fixtures

* handle only when True

* Address CR comments

* address readability from pr comment

* simplify
2026-03-16 23:47:00 -04:00
NanoCode012
a098df527b feat: add Mistral Small 4 (#3502)
* feat: add mistral small 4

* fix: update mistral common

* fix: deepcopy when passing in tokenizer

* feat: add doc on reasoning and thinking section

* fix: don't use custom tokenizer and quantize experts

* chore: update docs and configs

* chore: update doc to follow official name

* feat: update cce to include mistral4

* chore: move

* fix: naming

* fix: test mock breaking get_text_config check

* fix: enable CCE and add expert block targetting to configs

* chore: docs

* fix: use act checkpointing

* chore: doc

* chore: docs

* chore: docs
2026-03-17 09:39:05 +07:00
NanoCode012
7da5f94379 feat: add FA4 (#3481)
* feat: add FA4

* chore: update docs

* fix: recommend FA4 for those with compatible devices

* fix: adjust import check and add head_dim check

* chore: add limitation to doc

* fix: log warning and quit if cannot import validator

* chore: simplify

* fix: add caveat with FA2 shadow dir
2026-03-16 00:13:18 -04:00
Aarush
defee62d99 fix: fix CONTRIBUTING.md placeholders, bare except clauses, and add convert.py tests (#3485) [skip ci]
* docs: fix codestyle placeholders in CONTRIBUTING.md

Replace unresolved {codestyle} and {URLofCodestyle} template
variables with Ruff, the project's actual linter/formatter
as configured in .pre-commit-config.yaml.

* fix: replace bare except clauses with specific exception types

- quantization.py: use except ImportError for optional torchao imports
  (consistent with line 48 which already uses ImportError correctly)
- cli/config.py: use except (RuntimeError, AssertionError) for CUDA
  device property query

Prevents masking unrelated errors like KeyboardInterrupt or SystemExit.

* test: add unit tests for convert.py JSON/JSONL utilities

Cover FileReader, FileWriter, StdoutWriter, JsonParser,
JsonlSerializer, and JsonToJsonlConverter with 8 test cases
including roundtrip and edge case (empty list) scenarios.

Previously this module had zero test coverage.

* fix: address CodeRabbit review feedback

- quantization.py: catch (ImportError, RuntimeError) for optional
  torchao imports; CUDA wheel/GPU mismatches raise RuntimeError,
  not ImportError
- convert.py: remove unused output_file_path parameter from
  JsonToJsonlConverter.convert() — FileWriter already holds the
  output path from construction
- tests/test_convert.py: update call site to match new signature
2026-03-16 00:12:40 -04:00
VED
f56efdb4ab fix: high eval loss w/ sample packing (#3478) [skip ci]
* check if eval_sp

* radable condition
2026-03-15 22:11:23 -04:00
NanoCode012
d8a646c80d chore: logging cleanup (#3482) [skip ci] 2026-03-15 22:10:57 -04:00
VED
a806704e94 moe quant patch for merge miss match (#3483)
* moe quant patch for merge miss match

* lint

* revert test + fix moe patch

* comment fixxes

* e2e tests

* mismatch fixx tested

* mis match fix wwith vllm compatablity + test

* comment lint

* fix: missing os import, duplicate no op

* chore: simplify comments

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2026-03-15 22:10:30 -04:00
NanoCode012
23ad40bdd5 fix: disable async load when loading quantized bnb 2026-03-11 13:18:27 +07:00
Wing Lian
43b1c80aa6 load weights synchronously so they can be converted and not OOM: (#3477) 2026-03-07 07:09:24 -05:00
Wing Lian
876941ffd0 install flash-linear-attention (#3466)
* install flash-linear-attention

* handle prequant weights for fsdp2 and ensure loss is not zero

* fix type for cu_seqlen, uninstall causal_conv1d

* chore: lint

* uv pip uninstall doesn't need confirmation
2026-03-06 12:40:57 -05:00
NanoCode012
d65e1b960c fix: add guard for _initialize_missing_keys patch (#3469) [skip ci] 2026-03-06 11:45:03 -05:00
NanoCode012
0a23ae08f7 fix: position_ids casted to int64 for qwen35 patch (#3468) [skip ci]
* fix: position_ids casted to int64 for qwen35 patch

* fix: to use view instead of reshape to ensure noncontiguous error explicitly

* chore: lint
2026-03-06 11:44:00 -05:00
Wing Lian
fc2d63ee5f use new tf32 APIs for torch 2.9+ (#3467) [skip ci]
* use new tf32 APIs for torch 2.9+

* also upgrade cce for tf32 fixes and lint
2026-03-06 11:40:32 -05:00
VED
c119382337 add: qwen 3.5 (#3442)
* add: qwen 3.5

* test for qwen , patch

* lint

* qwen3 fix on main

* Apply suggestions from code review

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* moe config

* config moe

* configs and chore

* Update examples/qwen3.5/122b-a10b-moe-qlora.yaml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update examples/qwen3.5/35b-a3b-moe-qlora.yaml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* chore for qwen + vlm patch

* chore lint

* qwen lint

* 3_5_moe

* Update examples/qwen3.5/README.md

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2026-03-06 09:31:00 -05:00
NanoCode012
6c8c73e5a4 fix(validation): add validation for lora target linear with quantize experts (#3461)
* fix: add validation for lora target linear with quantize experts

* chore: fix lint

* chore: comment

* fix: missing link on readme
2026-03-06 09:19:05 -05:00
Gilles Turpin
da17c7c0d9 fix: use dp_world_size instead of world_size for batch_size with tensor parallelism (#3462) [skip ci] 2026-03-06 09:18:13 -05:00
Wing Lian
cada93cee5 upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459)
* upgrade transformers==5.3.0 trl==0.29.0 kernels

* use latest deepspeed fixes

* use corect image for cleanup

* fix test outputs for tokenizer fixes upstream

* fix import:

* keep trl at 0.28.0

* handle updated API

* use latest trl since 0.28.0 doesn't work with latest transformers

* use trl experimental for pad to length

* monkeypatch trl with ORPOTrainer so liger doesn't croak

* upgrade accelerate

* more fixes

* move patch for orpotrainer

* load the imports later

* remove use_logits_to_keep

* fix loss_type arg as a list

* fetch hf cache from s3

* just manually download the missing model for now

* lint for pre-commit update

* a few more missing models on disk

* fix: loss_type internally now list

* fix: remove deprecated code and raise deprecate

* fix: remove unneeded blocklist

* fix: remove reliance on transformers api to find package available

* chore: refactor shim for less sideeffect

* fix: silent trl experimental warning

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2026-03-06 09:11:20 -05:00
Wing Lian
56162f71db monkeypatch fix for fsdp with cpu ram efficient loading (#3464) [skip ci] 2026-03-06 09:10:58 -05:00
NanoCode012
6a8baf8fa7 feat: add sonicmoe (#3411)
* feat: add sonicmoe

* feat: add torch compile for routing

* feat: add routing smoke test

* feat: add qwen3_5_moe, qwen3_vl_moe, qwen3_omni_moe

* fix: disable mlp kernel for sonicmoe too

* feat: update to sonicmoe release

* chore: update import following new sonicmoe changes

* feat: update handling for blackwell

* feat: add sonicmoe e2e test

* fix: installation for updated sonicmoe

* fix: git commit

* fix: ignore py req and fix metadata

* fix: increase min hidden size to match sonicmoe kernel min

* fix: attempt properly interleave and handle unpatch mid-test

* chore: refactor teardown better

* chore: refactor to re-use rearrange

* fix: add idempotency guard

* fix: address comments on CI memory and interleave

* fix: tests grad, param doublewrapped
2026-03-05 13:43:31 -05:00