NanoCode012
359b7ad85e
fix: gemma3_text model loading vision config ( #3354 )
...
* fix: gemma3-text mode loading vision config
* fix: improve defaults to use lora kernels
2026-01-13 09:49:23 -05:00
VED
258ce8d4fa
feat : scaled softmax support ( #3338 )
...
* scaled softmax
* comment
* lint
* remove egear
* validation for flash
* lint
* val imporve + neet
* fix correct softmax scale val(learned)
* learned scale val 4 ssm
* lint
* fix model_type rmv
* sdpa_atten
* test fix + lint
* test fix
* sdp_a val rmv
* flex fix
* main flash
* lint
* flex attn
* lint comment
* fix score_mod
* Update src/axolotl/utils/schemas/validation.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-01-13 14:33:11 +07:00
@TT
3e0bbd33ec
feat: add ARM64/AArch64 build support to Dockerfile-base ( #3346 )
...
* Add support for capability to build arm64 image
* Fixing wrong variable TARGETPLATFORM bug
* Adding missing semicolons
* skip docker hub login if PR (no push) or no credentials
* Enabling arm64 builds for Dockerfile-base in Github actions
* TARGETARCH automatically default to platform arch under build
* Enabling arm64 builds for axolotl docker builds
* Enabling arm64 builds for axolotl-cloud docker build Github actions
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-12 12:00:02 -05:00
salman
4ae6f766ad
bump bnb to v0.49.1 ( #3351 )
2026-01-12 09:42:04 -05:00
VED
e7f0d4ba5b
Increased test coverage for lora/qlora ( #3147 )
...
* config_val tests
* remove config val(not needed)
* config validation
* parameter freeze validation
* merge/unmerge tests
* removal unwanted
* rename
* lint
* updated lint
* Update tests/utils/lora/test_config_validation_lora.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* pytest skip + mock fix
* nitpicks
* revert some nitpicks
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-01-06 11:44:48 -05:00
VED
7bf6f70e96
fix total/trainable tokens log ( #3344 )
...
* fix total/trainable tokens log
* fix total/trainable tokens log
2026-01-06 09:25:17 -05:00
PraMamba
8aab807e67
feat: Add SwanLab integration for experiment tracking ( #3334 )
...
* feat(swanlab): add SwanLab integration for experiment tracking
SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training.
Features:
- Hyperparameter logging
- Training metrics tracking
- RLHF completion logging
- Performance profiling
- Configuration validation and conflict detection
Includes:
- Plugin in src/axolotl/integrations/swanlab/
- Callback in src/axolotl/utils/callbacks/swanlab.py
- Tests in tests/integrations/test_swanlab.py
- Examples in examples/swanlab/
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit
- Change use_swanlab default to True (winglian)
- Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major)
- Add safe exception handling in config fallback (CodeRabbit)
- Use context managers for file operations (CodeRabbit)
- Replace LOG.error with LOG.exception for better debugging (CodeRabbit)
- Sort __all__ alphabetically (CodeRabbit)
- Add language specifiers to README code blocks (CodeRabbit)
- Fix end-of-file newline in README (pre-commit)
Resolves actionable comments and nitpicks from CodeRabbit review.
Addresses reviewer feedback from @winglian.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* only run swanlab integration tests if package is available
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-06 09:19:18 -05:00
Wing Lian
ee59e4de97
add cu130 + torch 2.9.1 to test matrices ( #3343 )
...
* add cu130 + torch 2.9.1 to test matrices
* uv can't use pip3 directly
2026-01-05 15:24:29 -05:00
Wing Lian
4e61b8aa23
use updated version of prebuilt wheels for flash attention for cu130 ( #3342 )
...
* use updated version of prebuilt wheels for flash attention for cu130
* use elif
* fix the uv base installs of FA also
* make wget less verbose
2026-01-05 13:48:12 -05:00
Wing Lian
b26ba3a5cb
don't build images w cuda 130 since we don't have flash attention wheels ( #3341 )
2026-01-03 18:08:28 -05:00
Wing Lian
afe18ace35
deprecate torch 2.7.1 ( #3339 )
2026-01-01 06:52:45 -05:00
github-actions[bot]
2b199f9915
chore: update pre-commit hooks ( #3340 ) [skip ci]
...
Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com >
2026-01-01 06:52:28 -05:00
Wing Lian
e73dab6df9
support pydantic 2.12 ( #3328 )
...
* upgrade pydantic to 2.12
* use latest modal version
* upgrade modal
* update modal in requirements and loosen pydantic
* upgrade modal too
2025-12-30 12:41:07 -05:00
VED
f45a97a9ff
docs for checkpiont saving ( #3335 ) [skip ci]
...
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-30 12:40:32 -05:00
Wing Lian
11c0b5b256
bartch upgrade dependencies ( #3299 )
...
* upgrade dependencies
* don't use reset sessions
* downgrade transformers, upgrade other deps
* upgrade bnb to 0.49.0
* restore s3 cache
* explicit use local files w hub
* decompress and strip top level dir
* use 2 levels for strip components
* try to preserve permissions for symlinks
* use updated tar
* fix #3293 for distributed
* downgrade bnb
* fast fail after 4
* fix total tokens device
* patch accelerate CP/SP (#3309 )
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-30 09:02:49 -05:00
Wing Lian
66a3de3629
build examples readmes with quarto ( #3046 )
...
* build examples readmes with quarto
* chore: formatting
* feat: dynamic build docs
* feat: add more model guides
* chore: format
* fix: collapse sidebar completely to have space for model guides
* fix: security protection for generated qmd
* fix: adjust collapse level, add new models, update links
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-25 19:17:25 +07:00
VED
a6080df73c
compute loss only if training and update token metric naming ( #3293 ) [skip ci]
...
* compute loss only if training
* save total_tokens for checkpiont
* check if string
* refactor total_tokens/ num_tokens
* refactor 2
* rplc trainable_step/trian_per_sec_per_gpu
* lint + log trainable/tokens
* consolidate it in the callback.
* test for total_tokes aftr remuse
* check if tokenstate exist after ckpt
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-25 18:38:17 +07:00
NanoCode012
4f5e8a328a
Feat: add MiMo and Plano ( #3332 ) [skip-ci]
...
* feat: add xiaomi's mimo 7b
* fix: pin revision
* fix: update trinity docs and pin revision
* fix: wrong config name
* feat: add vram usage
* feat: add plano
* feat: update plano vram usage
* chore: comments
2025-12-25 18:09:03 +07:00
NanoCode012
418933f0d1
feat: add internvl3_5 ( #3141 ) [skip-ci]
...
* feat: add internvl3_5
* fix: add timm instructions
* chore: add kimi-linear to cce doc
* feat: update internvl example
* chore: pin revision
* chore: remove from multipack
* fix: add to multimodal array
* fix: internvl use hf version
* feat: update cce
* chore: lint
* fix: list for image_size
* chore: add docs vram usage
* feat: enable cce
* fix: no need trust remote code
* fix: inconsistent timm version
2025-12-25 18:07:59 +07:00
NanoCode012
372f664c63
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc ( #3330 ) [skip-ci]
...
* feat: add pos id to flex attention for packing part 1
* feat: update to include sliding window mask patch
* fix: suppress MatMul8bitLt: inputs will be cast from warnings
* fix: remove redundant flex attention patch
* chore: update olmo docs
* feat: add validator patch for cross entropy
2025-12-25 17:56:20 +07:00
NanoCode012
97f1b1758d
Feat: add kimi linear support ( #3257 )
...
* feat: add custom kimi linear patch [skip ci]
* feat: add configuration file and fix import [skip ci]
* fix: hijack tokenizer temporarily [skip ci]
* chore: remove accidental commit
* fix: attempt patch kimi remote
* fix: kwargs passsed
* fix: device for tensor
* fix: aux loss calculation
* feat: cleaned up patches order
* fix: remove duplicate tokenizer patch
* chore: add debug logs
* chore: add debug logs
* chore: debug
* Revert "chore: add debug logs"
This reverts commit da372a5f67 .
* Revert "chore: add debug logs"
This reverts commit 97d1de1d7c .
* fix: KeyError: 'tokenization_kimi'
* fix: support remote_model_id in cce patch
* feat: add config preload patch
* fix: use standard aux loss calc and updated modeling
* fix: import
* feat: add kimi-linear docs and example
* chore: add note about moe kernels
* feat: update cce to include kimi-linear
* chore: lint
* chore: update main readme
* fix: patch mechanism to address comments
* chore: lint
* fix: tests
* chore: cleanup comment
2025-12-25 17:53:52 +07:00
Abubakar Abid
f2155eaf79
feat: add trackio as experiment tracking integration ( #3253 )
...
* feat: add trackio as experiment tracking integration
- Add TrackioConfig to integrations schema with project_name, run_name, and space_id
- Create trackio_.py module for environment setup
- Add is_trackio_available() utility function
- Integrate trackio with report_to in trainer builder
- Add trackio callback for experiment tracking
- Add trackio config keys to gpt-oss example YAMLs
- Trackio runs locally by default, syncs to HF Space if space_id provided
* changes
* changes
* changes
* changes
* changes
* changes
* changes
* Update requirements.txt
* don't allow pydantic 2.12 for now
---------
Co-authored-by: Abubakar Abid <aaabid93@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-23 08:49:07 -05:00
kallewoof
92ee4256f7
feature: raise on long sequence drop ( #3321 )
...
* feature: raise on long sequence drop
It is sometimes not desired that sequences are silently dropped from the dataset, especially when the dataset has been carefully crafted and pre-fitted for the training context. This would then suggest that an error occurred somewhere in the process. This feature adds a third value for excess_length_strategy called 'raise', which will raise a ValueError if a sequence is encountered that is too long and would have normally been dropped/truncated.
* tests: add excess_length_strategy tests
* doc: updated return value description for drop_long_seq_in_dataset
* add @enable_hf_offline
* fixed cfg modified after validate_config called
* hf offline fix
* fix tqdm desc when raise is used
* test: added test for non-batched case
* accidental code change revert
* test: use pytest.raises
* test: simplified drop_seq_len tests
* test: moved excess_length_strat test to test_data.py
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-22 13:59:49 -05:00
Wing Lian
efeb5a4e41
fix check for fp8 capability ( #3324 )
...
* fix check for fp8 capability
* handle non-cuda compute
* reduce concurrency of tests
2025-12-22 13:58:25 -05:00
VED
faaff6c792
allow users to set ndigits for rounding of metrics when logging ( #3325 )
...
* METRIC_PRECISION-> 8
* use ndigits and move env getter to top of log function
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-22 08:54:43 -05:00
Alexander Kozhevnikov
43cef27458
Fix typo in densemixer RuntimeError ( #3327 ) [skip ci]
...
It offers installing densemizer while it should be densemixer
2025-12-22 08:53:58 -05:00
Wing Lian
07c41a6c2a
fix preview docs failing due to running out of disk ( #3326 ) [skip ci]
...
* fix preview docs failing due to running out of disk
* fix docs publish too
2025-12-19 11:34:55 -05:00
salman
bbd3486f57
Distributed Muon Optimizer ( #3264 )
...
* init
* working
* updating configs
* removing unneeded files
* lint
* comments
* lint
* fix regex match
* bump contribs version
* comments
* fixing tests and imports
* muon imports in test v2
* test cleanup
* bump contribs version
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2025-12-19 10:43:47 -05:00
VED
3750d7dd64
add liger support kernal for dpo ( #3302 )
...
* add liger kernal 4 dpo
* revert grpo changes,add support in dpo
* revert grpo changes,add support in dpo
* dpo_use_liger_kernal
* fix liger_dpo
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-18 11:11:06 -05:00
xzuyn
2197b0bf89
feat: cheap ppl metric ( #3317 )
...
* Import math and compute perplexity from loss values
* lint
* coderabbit changes
* lint
* fix: add rounding to ppl
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-18 09:02:41 -05:00
Seung Hyun Cho
3e51a680c2
fix: Fix evaluation loss in KD trainer ( #3271 )
...
* fix: Fix evaluation loss in KD trainer
* Fix v2 strategy super() call
* fix: Add safety check for total_tokens in log method
* fix: simplified num items and outputs return handling
* fix: add missing model forward pass in compute_loss
* refactor: Use Template Method pattern for chat template strategies
* refactor: use pop(None) and remove v2 override
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 13:40:36 -05:00
xzuyn
2cf254b4af
Add peft_autocast_adapter_dtype config option ( #3311 ) [skip ci]
...
* Add `peft_autocast_adapter_dtype` field to schema
* Add `autocast_adapter_dtype` to `model_kwargs`
* chore: docs
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-17 10:09:39 -05:00
salman
83d4d97dcc
Add QAT NVFP4 configs for blogpost ( #3280 ) [skip ci]
...
* add configs for blogpost
* fix configs
* fixing baseline configs
2025-12-17 09:35:22 -05:00
NanoCode012
a1d07f42e4
Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate ( #3313 )
...
* fix: leftover ministral docs changes
* fix: pytorch_cuda_alloc_conf deprecation
* fix: set old PYTORCH_CUDA_ALLOC_CONF env too
* handle 2.9 separately
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 09:12:18 -05:00
Wing Lian
2a664dc8ad
support for xformers wheels for torch 2.9 ( #3308 )
...
* support for xformers wheels for torch 2.9
* fix hf cache?
* don't use hf cache from s3
* show disk free space in ci
2025-12-11 11:56:40 -05:00
NanoCode012
4ac78aa562
fix: update qwen3 jinja tokenization off a few tokens ( #3295 )
...
* fix: update qwen3 jinja tokenization off a few tokens
* fix: add note on tokenization issue
* fix: pop last index for mistral tokenizer
2025-12-09 14:31:03 +07:00
VED
b3f4aa149f
fix bin size ( #3307 )
...
* fix bin size
* lint
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-08 09:16:18 -05:00
salman
75b20fb66f
Save processor in quantizer CLI ( #3290 )
2025-12-06 16:27:18 +00:00
NanoCode012
5992e607a2
fix: improve ministral3 docs to be clearer ( #3300 )
...
* fix: improve ministral3 docs to be clearer
* fix: title
* chore: wording
2025-12-04 21:44:44 +07:00
NanoCode012
2b66ee189c
Feat: add ministral3 ( #3297 )
...
* feat: add ministral and mistral3
* chore: lint
* feat: update cce for ministral
* fix: add vram usage
* feat: update for release
* fix: save_pretrained issue in v5
* fix: add instructions to use v5 branch
* fix: add to multipack
* fix: improve instructions
* fix: add model to readme
2025-12-04 08:32:08 -05:00
NanoCode012
86d8cca149
Feat: add trinity by ArceeAI ( #3292 )
2025-12-02 13:12:55 -05:00
NanoCode012
4a0f98e612
feat: upgrade liger to 0.6.4 ( #3289 )
2025-12-02 09:16:23 -05:00
Yohan Na
c6ddcdd06a
feat: add exaone4 chat template and update enums ( #3279 )
...
* feat: add exaone4 chat template and update enums
* fix: handle first message as system or tools in exaone4 chat template
* Update src/axolotl/utils/chat_templates/templates/exaone4.jinja
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* fix: lint
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-01 15:52:45 +07:00
github-actions[bot]
7fb6a947d9
chore: update pre-commit hooks ( #3287 )
...
Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com >
2025-12-01 15:03:14 +07:00
NanoCode012
b234532d9f
Feat: add peft_ensure_weight_tying ( #3278 )
...
* feat: upgrade peft to 0.18.0
* feat: add peft_ensure_weight_tying
* fix: default
* chore: adjust kwarg per feedback
2025-11-28 18:54:48 +07:00
VED
8990ca3205
fix: removed unused "scikit-learn==1.4.2" ( #3277 )
...
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-11-24 13:48:53 +07:00
NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
Wing Lian
0b635e69c5
build docker images for 2.9.x ( #3273 )
2025-11-20 09:26:24 -05:00
Wing Lian
0d27e14e45
Torch 2.9.1 base images ( #3268 )
...
* update torch 2.9.1 base images
* update base dockerfile image check
2025-11-20 09:04:37 -05:00
NanoCode012
f5f21fb216
chore: update readme with latest updates ( #3267 )
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, <nil>, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (vllm, 126, 12.6.3, true, 3.11, 2.7.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
v0.13.0
2025-11-18 14:45:21 +07:00