NanoCode012
5a5cf30b26
fix: add dequant bf16 repo ( #3507 ) [skip ci]
2026-03-20 17:11:46 +07:00
Owen Arliawan
c57acef2c7
Qwen3.5-MoE example config with lora_target_modules regex ( #3515 ) [skip ci]
...
* lora target modules with regex
* updates
* fsdp for non moe
* update wording
* chore: cleanup and lint
* chore: cleanup docs from merge
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2026-03-20 16:52:46 +07:00
VED
c13cb7c853
feat: add nemotron config ( #3506 )
...
* nemotron config exp
* Update examples/nemotron/nemotron-mini-4b-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-03-20 16:23:42 +07:00
VED
b3823cc6b0
fix: gemma3 configs ( #3500 ) [skip ci]
...
* gemma fft , text fix
* good lint
2026-03-20 16:14:06 +07:00
VED
113d275bd9
qwen docs + new config ( #3499 ) [skip ci]
...
* qwen docs + new config
* docss lint
* simplify comments
* read me
* lint comments
* Update docs/multimodal.qmd
* Update docs/multimodal.qmd
* Update examples/qwen3.5/9b-fft-vision.yaml
* chore: fix link and incorrect points
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2026-03-20 16:13:34 +07:00
Wing Lian
1fc86d5295
Scattermoe LoRA optimizations ( #3513 )
...
* optimize moe + lora
* more scattermoe optims
* selective dequant
* add correctness unit tests and benchmarks for scattermoe + lora
* handle base+lora split kernel for older moe models
* chore: lint
* fix casting for H200 and B200
* register pressure estimation and pruning for h200/b200
* use soft limit for pruning
* qkv patch for qwen3.5moe
* support text_model for qwen3.5 moe
* nesting of qwen3
* use udpated cce with zero3 support
* Fix decomposed backward for QKV and O projections
eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.
2026-03-19 23:07:42 -04:00
NanoCode012
a098df527b
feat: add Mistral Small 4 ( #3502 )
...
* feat: add mistral small 4
* fix: update mistral common
* fix: deepcopy when passing in tokenizer
* feat: add doc on reasoning and thinking section
* fix: don't use custom tokenizer and quantize experts
* chore: update docs and configs
* chore: update doc to follow official name
* feat: update cce to include mistral4
* chore: move
* fix: naming
* fix: test mock breaking get_text_config check
* fix: enable CCE and add expert block targetting to configs
* chore: docs
* fix: use act checkpointing
* chore: doc
* chore: docs
* chore: docs
2026-03-17 09:39:05 +07:00
Wing Lian
fc2d63ee5f
use new tf32 APIs for torch 2.9+ ( #3467 ) [skip ci]
...
* use new tf32 APIs for torch 2.9+
* also upgrade cce for tf32 fixes and lint
2026-03-06 11:40:32 -05:00
VED
c119382337
add: qwen 3.5 ( #3442 )
...
* add: qwen 3.5
* test for qwen , patch
* lint
* qwen3 fix on main
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* moe config
* config moe
* configs and chore
* Update examples/qwen3.5/122b-a10b-moe-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/qwen3.5/35b-a3b-moe-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* chore for qwen + vlm patch
* chore lint
* qwen lint
* 3_5_moe
* Update examples/qwen3.5/README.md
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-03-06 09:31:00 -05:00
VED
1eaf4d7418
add: support mxfp4 axo ( #3375 )
...
* mxfp4 axo
* import lint
* test for qat mxfp4
* config for mxfp4
* add qat:
* pass base config
* MXFakeQuantizeConfig
* lint
* tune config so it fits in 32GB VRAM
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-03-05 13:40:45 -05:00
NanoCode012
753906cfc7
feat: add doc for expert quantization, glm45 air example configs, and update readme for release ( #3452 ) [skip ci]
...
* chore: rename without period
* feat: add glm45 air
* feat: add doc on expert quantization
* feat: update base readme with new changes
* chore: cleanup
* chore: cleanup
* chore: cleanup
* fix: disable quantize_moe_expert on merge per comment
* chore: add kernel info to optimizations doc
2026-03-05 09:58:09 -05:00
NanoCode012
945c8aeb10
Fix: quantize and target moe layers in transformers v5 for adapters and many misc fixes ( #3439 )
...
* fix: saving clones state dict
* fix: apply fix for only CP mode
* fix: add dropout check when using lora target param
* fix: re-add patch from transformers PR #39866
* feat: add moe quant to test by ved
* fix: try match target param properly end with
* fix: clear cache per param quant
* fix: attempt on-load quantize experts instead of post-load
* fix: attempt disable async load
* chore: add log
* chore: adjust log
* fix: remove cuda alloc for moe and enable async load
* chore: remove leftover logs
* chore: add extra empty cache
* fix(doc): clarify support
* fix: handle fsdp2 for paramwrapper dtensor
* feat: attempt to quant experts in 8bit mode too
* feat: attempt to release bf16 experts from vram
* feat: upgrade cce
* fix: fsdp2 init_sharded_param load int8/uint4 dtensor as
require_grad=true on init
* fix: remove unnecessary gc and empty cache
* Revert "fix: remove unnecessary gc and empty cache"
This reverts commit 1d54518990 .
* fix: do not call full_tensor on non-dtensors
* fix: attempt to address fsdp2 with quant exp high loss
* fix: attempt lora quant experts wrong dim
* fix: ensure require_grad patch applied for lora 8bit
* fix: attempt lora 8bit fsdp2
* fix: attribute access on save for lora 8bit fsdp2
* fix: wrong weight attrib access
* chore(refactor): add config, re-arrange position of patches, clean
comments
* feat: add example docs
* chore: cherry pick trinity fixes from PR 3399
* chore: comments refactor; add guards
* fix: guard using wrong key
* fix: mamba save does not accept main process param
* fix: guard prevent double hook
* fix: move gc to upper scope
* chore: add comment on proxy forward patch
* fix: add comment to clarify
* feat: add test idempotency
* fix: AttributeError: `e_score_correction_bias` is not an nn.Parameter
* fix: AttributeError: 'NoneType' object has no attribute 'to'
* fix: update docs on cpu_ram_efficient_loading
2026-03-03 10:06:23 -05:00
NanoCode012
e672d37f33
fix: qwen3-next to use fla causal-conv1d to support packing ( #3437
...
* fix: qwen3-next to use fla causal-conv1d to support packing
* fix: causal import and update doc for v5
* fix: hard fail for packing without fla
2026-03-03 09:26:46 -05:00
NanoCode012
43d60c7439
bump cut-cross-entropy to 58d6572 ( #3424 )
2026-02-20 14:24:51 -05:00
NanoCode012
b6d3653f74
feat: add step3p5 for cce ( #3384 ) [skip ci]
...
* feat: add step3p5 for cce
* chore: reorder model
2026-02-10 17:51:43 +07:00
VED
0343a72cc9
add glm support + patch ( #3329 ) [skip ci]
...
* add glm support + patch
* lint
* lint
* Update examples/glm4/glm-4-6v-flash-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm4/glm-4-6v-flash-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/processing_strategies.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* patch removed
* lint
* lint2
* docs + rename
* rmv moe
* docs
* removed processor
* sdpa T_T"
* ddp_find_unused_parameters: true
* muti gpu yaml tested both
* muti gpu yaml tested both
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* rmv text only section + v5 comments
* rename
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-02-10 17:43:53 +07:00
NanoCode012
3dd86d35b8
feat: add new cce support for glm series and exaone4 ( #3373 ) [skip ci]
2026-01-28 06:44:44 -05:00
salman
dd9ebaeba1
EAFT ( #3366 ) [skip ci]
...
* wip eaft
* fix eaft loss fn
* adding ref
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2026-01-28 06:44:15 -05:00
Wing Lian
fc4e37920b
transformers v5 upgrade ( #3272 )
...
* Prepare for transformers v5 upgrade
* fix hf cli
* update for hf hub changes
* fix tokenizer apply_chat_template args
* remap include_tokens_per_second
* fix tps
* handle migration for warmup
* use latest hf hub
* Fix scan -> ls
* fix import
* fix for renaming of mistral common tokenizer -> backend
* update for fixed tokenziation for llama
* Skip phi35 tests for now
* remove mistral patch fixed upstream in huggingface/transformers#41439
* use namespacing for patch
* don't rely on sdist for e2e tests for now
* run modal ci without waiting too
* Fix dep for ci
* fix imports
* Fix fp8 check
* fsdp2 fixes
* fix version handling
* update fsdp version tests for new v5 behavior
* Fail multigpu tests after 3 failures
* skip known v5 broken tests for now and cleanup
* bump deps
* unmark skipped test
* re-enable test_fsdp_qlora_prequant_packed test
* increase multigpu ci timeout
* skip broken gemma3 test
* reduce timout back to original 120min now that the hanging test is skipped
* fix for un-necessary collator for pretraining with bsz=1
* fix: safe_serialization deprecated in transformers v5 rc01 (#3318 )
* torch_dtype deprecated
* load model in float32 for consistency with tests
* revert some test fixtures back
* use hf cache ls instead of scan
* don't strip fsdp_version
more fdsp_Version fixes for v5
fix version in fsdp_config
fix aliasing
fix fsdp_version check
check fsdp_version is 2 in both places
* Transformers v5 rc2 (#3347 )
* bump dep
* use latest fbgemm, grab model config as part of fixture, un-skip test
* import AutoConfig
* don't need more problematic autoconfig when specifying config.json manually
* add fixtures for argilla ultrafeedback datasets
* download phi4-reasoning
* fix arg
* update tests for phi fast tokenizer changes
* use explicit model types for gemma3
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
* fix: AutoModelForVision2Seq -> AutoModelForImageTextToText
* chore: remove duplicate
* fix: attempt fix gemma3 text mode
* chore: lint
* ga release of v5
* need property setter for name_or_path for mistral tokenizer
* vllm not compatible with transformers v5
* setter for chat_template w mistral too
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2026-01-27 17:08:24 -05:00
Wing Lian
a531e9d946
upgrade vllm to v0.14.0 ( #3345 )
2026-01-21 20:00:18 -05:00
VED
d0d26d5064
feat: Add GDPO Support ( #3353 )
...
* gdpo support - test left
* lint
* fixxes for vllm serv
* test advantages
* docss
* lint
* lint =
* gdpo simple + lint
* lint nit
* example
* lint
* trl 0.27.0
* blocklist
* test assert rmv
* add validation check for GDPO + sum_then_normalize
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-21 17:22:45 -05:00
NanoCode012
359b7ad85e
fix: gemma3_text model loading vision config ( #3354 )
...
* fix: gemma3-text mode loading vision config
* fix: improve defaults to use lora kernels
2026-01-13 09:49:23 -05:00
VED
258ce8d4fa
feat : scaled softmax support ( #3338 )
...
* scaled softmax
* comment
* lint
* remove egear
* validation for flash
* lint
* val imporve + neet
* fix correct softmax scale val(learned)
* learned scale val 4 ssm
* lint
* fix model_type rmv
* sdpa_atten
* test fix + lint
* test fix
* sdp_a val rmv
* flex fix
* main flash
* lint
* flex attn
* lint comment
* fix score_mod
* Update src/axolotl/utils/schemas/validation.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-01-13 14:33:11 +07:00
PraMamba
8aab807e67
feat: Add SwanLab integration for experiment tracking ( #3334 )
...
* feat(swanlab): add SwanLab integration for experiment tracking
SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training.
Features:
- Hyperparameter logging
- Training metrics tracking
- RLHF completion logging
- Performance profiling
- Configuration validation and conflict detection
Includes:
- Plugin in src/axolotl/integrations/swanlab/
- Callback in src/axolotl/utils/callbacks/swanlab.py
- Tests in tests/integrations/test_swanlab.py
- Examples in examples/swanlab/
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit
- Change use_swanlab default to True (winglian)
- Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major)
- Add safe exception handling in config fallback (CodeRabbit)
- Use context managers for file operations (CodeRabbit)
- Replace LOG.error with LOG.exception for better debugging (CodeRabbit)
- Sort __all__ alphabetically (CodeRabbit)
- Add language specifiers to README code blocks (CodeRabbit)
- Fix end-of-file newline in README (pre-commit)
Resolves actionable comments and nitpicks from CodeRabbit review.
Addresses reviewer feedback from @winglian.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* only run swanlab integration tests if package is available
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-06 09:19:18 -05:00
Wing Lian
66a3de3629
build examples readmes with quarto ( #3046 )
...
* build examples readmes with quarto
* chore: formatting
* feat: dynamic build docs
* feat: add more model guides
* chore: format
* fix: collapse sidebar completely to have space for model guides
* fix: security protection for generated qmd
* fix: adjust collapse level, add new models, update links
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-25 19:17:25 +07:00
NanoCode012
4f5e8a328a
Feat: add MiMo and Plano ( #3332 ) [skip-ci]
...
* feat: add xiaomi's mimo 7b
* fix: pin revision
* fix: update trinity docs and pin revision
* fix: wrong config name
* feat: add vram usage
* feat: add plano
* feat: update plano vram usage
* chore: comments
2025-12-25 18:09:03 +07:00
NanoCode012
418933f0d1
feat: add internvl3_5 ( #3141 ) [skip-ci]
...
* feat: add internvl3_5
* fix: add timm instructions
* chore: add kimi-linear to cce doc
* feat: update internvl example
* chore: pin revision
* chore: remove from multipack
* fix: add to multimodal array
* fix: internvl use hf version
* feat: update cce
* chore: lint
* fix: list for image_size
* chore: add docs vram usage
* feat: enable cce
* fix: no need trust remote code
* fix: inconsistent timm version
2025-12-25 18:07:59 +07:00
NanoCode012
372f664c63
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc ( #3330 ) [skip-ci]
...
* feat: add pos id to flex attention for packing part 1
* feat: update to include sliding window mask patch
* fix: suppress MatMul8bitLt: inputs will be cast from warnings
* fix: remove redundant flex attention patch
* chore: update olmo docs
* feat: add validator patch for cross entropy
2025-12-25 17:56:20 +07:00
NanoCode012
97f1b1758d
Feat: add kimi linear support ( #3257 )
...
* feat: add custom kimi linear patch [skip ci]
* feat: add configuration file and fix import [skip ci]
* fix: hijack tokenizer temporarily [skip ci]
* chore: remove accidental commit
* fix: attempt patch kimi remote
* fix: kwargs passsed
* fix: device for tensor
* fix: aux loss calculation
* feat: cleaned up patches order
* fix: remove duplicate tokenizer patch
* chore: add debug logs
* chore: add debug logs
* chore: debug
* Revert "chore: add debug logs"
This reverts commit da372a5f67 .
* Revert "chore: add debug logs"
This reverts commit 97d1de1d7c .
* fix: KeyError: 'tokenization_kimi'
* fix: support remote_model_id in cce patch
* feat: add config preload patch
* fix: use standard aux loss calc and updated modeling
* fix: import
* feat: add kimi-linear docs and example
* chore: add note about moe kernels
* feat: update cce to include kimi-linear
* chore: lint
* chore: update main readme
* fix: patch mechanism to address comments
* chore: lint
* fix: tests
* chore: cleanup comment
2025-12-25 17:53:52 +07:00
Abubakar Abid
f2155eaf79
feat: add trackio as experiment tracking integration ( #3253 )
...
* feat: add trackio as experiment tracking integration
- Add TrackioConfig to integrations schema with project_name, run_name, and space_id
- Create trackio_.py module for environment setup
- Add is_trackio_available() utility function
- Integrate trackio with report_to in trainer builder
- Add trackio callback for experiment tracking
- Add trackio config keys to gpt-oss example YAMLs
- Trackio runs locally by default, syncs to HF Space if space_id provided
* changes
* changes
* changes
* changes
* changes
* changes
* changes
* Update requirements.txt
* don't allow pydantic 2.12 for now
---------
Co-authored-by: Abubakar Abid <aaabid93@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-23 08:49:07 -05:00
Wing Lian
efeb5a4e41
fix check for fp8 capability ( #3324 )
...
* fix check for fp8 capability
* handle non-cuda compute
* reduce concurrency of tests
2025-12-22 13:58:25 -05:00
salman
bbd3486f57
Distributed Muon Optimizer ( #3264 )
...
* init
* working
* updating configs
* removing unneeded files
* lint
* comments
* lint
* fix regex match
* bump contribs version
* comments
* fixing tests and imports
* muon imports in test v2
* test cleanup
* bump contribs version
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2025-12-19 10:43:47 -05:00
salman
83d4d97dcc
Add QAT NVFP4 configs for blogpost ( #3280 ) [skip ci]
...
* add configs for blogpost
* fix configs
* fixing baseline configs
2025-12-17 09:35:22 -05:00
NanoCode012
a1d07f42e4
Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate ( #3313 )
...
* fix: leftover ministral docs changes
* fix: pytorch_cuda_alloc_conf deprecation
* fix: set old PYTORCH_CUDA_ALLOC_CONF env too
* handle 2.9 separately
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 09:12:18 -05:00
NanoCode012
4ac78aa562
fix: update qwen3 jinja tokenization off a few tokens ( #3295 )
...
* fix: update qwen3 jinja tokenization off a few tokens
* fix: add note on tokenization issue
* fix: pop last index for mistral tokenizer
2025-12-09 14:31:03 +07:00
NanoCode012
5992e607a2
fix: improve ministral3 docs to be clearer ( #3300 )
...
* fix: improve ministral3 docs to be clearer
* fix: title
* chore: wording
2025-12-04 21:44:44 +07:00
NanoCode012
2b66ee189c
Feat: add ministral3 ( #3297 )
...
* feat: add ministral and mistral3
* chore: lint
* feat: update cce for ministral
* fix: add vram usage
* feat: update for release
* fix: save_pretrained issue in v5
* fix: add instructions to use v5 branch
* fix: add to multipack
* fix: improve instructions
* fix: add model to readme
2025-12-04 08:32:08 -05:00
NanoCode012
86d8cca149
Feat: add trinity by ArceeAI ( #3292 )
2025-12-02 13:12:55 -05:00
NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
NanoCode012
49b8107989
feat: add granite4 examples ( #3256 ) [skip ci]
2025-11-13 10:19:16 -05:00
NanoCode012
9901ee5602
fix: voxtralprocessor broken ( #3255 ) [skip ci]
...
* fix: voxtralprocessor broken
* chore: add todo
* chore: wording
2025-11-13 10:18:42 -05:00
NanoCode012
01a346d86a
feat(example): add gpt-oss-safeguard docs ( #3243 )
...
* feat(example): add gpt-oss-safeguard docs
* fix: add doc on reasoning_effort
2025-11-04 07:39:21 +07:00
NanoCode012
26f05b6008
fix(example): set model_type to load for gemma3 text ( #3242 )
...
* fix: set model_type to load for gemma3 text
* chore: simplify
* chore: unify
2025-11-04 07:35:07 +07:00
VED
4dc018992d
Feat/opentelemetry ( #3215 )
2025-10-22 19:16:55 -07:00
NanoCode012
243620394a
fix: force train split for json,csv,txt for test_datasets and misc doc changes ( #3226 )
...
* fix: force train split for json,csv,txt for test_datasets
* feat(doc): add info on mixing datasets for VLM
* feat(doc): max memory
* fix(doc): clarify lr groups
* fix: add info on vision not being dropped
* feat: add qwen3-vl to multimodal docs
* fix: add moe blocks to arch list
* feat(doc): improve mistral docs
* chore: add helpful link [skip-e2e]
* fix: add vram usage for mistral small
* Update link in docs/faq.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-10-22 15:23:20 -07:00
NanoCode012
8c7f63cf97
fix: unpack cce imported incorrectly ( #3212 ) [skip ci]
2025-10-13 17:19:15 +07:00
salman
143dea4753
FSDPConfig (#3170 )
2025-10-10 14:44:25 +01:00
NanoCode012
ab63b92c38
feat: add lfm2 family and latest moe model ( #3208 )
...
* feat: add lfm2 family and latest moe model
* fix: use ml-cross-entropy for lfm2 examples
2025-10-09 10:47:41 -04:00
Grant Holmes (Ren)
850c1a5f8d
Add FSDP v2 swap memory support + QLoRA compatibility fixes ( #3167 )
...
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-26 10:23:59 +01:00
NanoCode012
7fa8ac40cd
Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches ( #3178 )
...
* feat: upgrade cce with patches for transformers 4.56
* feat: add missing models to cce readme
2025-09-26 12:11:29 +07:00