NanoCode012
ed7105dba7
fix: GRPO config not accept max_prompt_length ( #3390 ) [skip ci]
2026-02-10 17:52:09 +07:00
NanoCode012
b6d3653f74
feat: add step3p5 for cce ( #3384 ) [skip ci]
...
* feat: add step3p5 for cce
* chore: reorder model
2026-02-10 17:51:43 +07:00
NanoCode012
fcc4cfdb63
feat: add sageattention ( #2823 ) [skip ci]
...
* feat: add sageattention
* feat: call path on pre model load
* fix: patch to use register to correct var
* fix: add strict check import at start
* chore: fix comments
* chore: refactor
* feat: add capability check
* fix: missed underscore
* fix: let sageattention use FA backend in transformers
* feat: update sage attention for attention mask and position ids
* feat: allow sample packing but add warning without packing
* fix: loss hitting 0 with packing and attention mask note
* feat: downcast embeds if sage attention too
* feat: add config validation
* feat: add attention docs
* chore: docs
2026-02-10 17:49:21 +07:00
VED
97a4f28511
fix: saving state dict and eval for Context Parallel ( #3382 ) [skip ci]
...
* clone state_dict if none
* patch calculating eval loss for cp
2026-02-10 17:47:26 +07:00
VED
86a5803212
train_per_sec_per_gpu metric ( #3364 ) [skip ci]
...
* fix token count
* guard for none n zero
2026-02-10 17:44:55 +07:00
tgoab
530a0c0bf0
Changes from dataset_processes to dataset_num_proc ( #3352 ) [skip ci]
...
* changes from dataset_processes to dataset_num_proc
* deprecation message improved
---------
Co-authored-by: Juliana Nieto Cárdenas <jnietoca@purdue.edu >
2026-02-10 17:44:17 +07:00
VED
0343a72cc9
add glm support + patch ( #3329 ) [skip ci]
...
* add glm support + patch
* lint
* lint
* Update examples/glm4/glm-4-6v-flash-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm4/glm-4-6v-flash-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/processing_strategies.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* patch removed
* lint
* lint2
* docs + rename
* rmv moe
* docs
* removed processor
* sdpa T_T"
* ddp_find_unused_parameters: true
* muti gpu yaml tested both
* muti gpu yaml tested both
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* rmv text only section + v5 comments
* rename
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-02-10 17:43:53 +07:00
Wing Lian
236dad3bb7
set 0.15.0.dev0 version ( #3380 )
2026-01-30 21:28:01 -05:00
Wing Lian
be00978bc2
tag for v0.14.0 release ( #3379 )
ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 129, 12.9.1, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 129, 12.9.1, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
v0.14.0
2026-01-30 14:10:27 -05:00
Wing Lian
3738978394
Add support for batched_mm, grouped_mm and scattermoe for MoE models ( #3377 )
...
* kernels plugin for moe for v5
* add support for native batched_mm or grouped_mm
2026-01-29 14:25:47 -05:00
Wing Lian
6132a30cda
handle warnings from v5 upgrade ( #3376 )
2026-01-28 06:45:01 -05:00
NanoCode012
3dd86d35b8
feat: add new cce support for glm series and exaone4 ( #3373 ) [skip ci]
2026-01-28 06:44:44 -05:00
salman
dd9ebaeba1
EAFT ( #3366 ) [skip ci]
...
* wip eaft
* fix eaft loss fn
* adding ref
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2026-01-28 06:44:15 -05:00
Wing Lian
fc4e37920b
transformers v5 upgrade ( #3272 )
...
* Prepare for transformers v5 upgrade
* fix hf cli
* update for hf hub changes
* fix tokenizer apply_chat_template args
* remap include_tokens_per_second
* fix tps
* handle migration for warmup
* use latest hf hub
* Fix scan -> ls
* fix import
* fix for renaming of mistral common tokenizer -> backend
* update for fixed tokenziation for llama
* Skip phi35 tests for now
* remove mistral patch fixed upstream in huggingface/transformers#41439
* use namespacing for patch
* don't rely on sdist for e2e tests for now
* run modal ci without waiting too
* Fix dep for ci
* fix imports
* Fix fp8 check
* fsdp2 fixes
* fix version handling
* update fsdp version tests for new v5 behavior
* Fail multigpu tests after 3 failures
* skip known v5 broken tests for now and cleanup
* bump deps
* unmark skipped test
* re-enable test_fsdp_qlora_prequant_packed test
* increase multigpu ci timeout
* skip broken gemma3 test
* reduce timout back to original 120min now that the hanging test is skipped
* fix for un-necessary collator for pretraining with bsz=1
* fix: safe_serialization deprecated in transformers v5 rc01 (#3318 )
* torch_dtype deprecated
* load model in float32 for consistency with tests
* revert some test fixtures back
* use hf cache ls instead of scan
* don't strip fsdp_version
more fdsp_Version fixes for v5
fix version in fsdp_config
fix aliasing
fix fsdp_version check
check fsdp_version is 2 in both places
* Transformers v5 rc2 (#3347 )
* bump dep
* use latest fbgemm, grab model config as part of fixture, un-skip test
* import AutoConfig
* don't need more problematic autoconfig when specifying config.json manually
* add fixtures for argilla ultrafeedback datasets
* download phi4-reasoning
* fix arg
* update tests for phi fast tokenizer changes
* use explicit model types for gemma3
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
* fix: AutoModelForVision2Seq -> AutoModelForImageTextToText
* chore: remove duplicate
* fix: attempt fix gemma3 text mode
* chore: lint
* ga release of v5
* need property setter for name_or_path for mistral tokenizer
* vllm not compatible with transformers v5
* setter for chat_template w mistral too
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2026-01-27 17:08:24 -05:00
Wing Lian
a531e9d946
upgrade vllm to v0.14.0 ( #3345 )
2026-01-21 20:00:18 -05:00
Wing Lian
04328aeb97
cu129 targets for ci builds ( #3369 )
...
* cu129 targets for ci builds
* remove copy-paste is_latest
2026-01-21 17:24:44 -05:00
VED
d0d26d5064
feat: Add GDPO Support ( #3353 )
...
* gdpo support - test left
* lint
* fixxes for vllm serv
* test advantages
* docss
* lint
* lint =
* gdpo simple + lint
* lint nit
* example
* lint
* trl 0.27.0
* blocklist
* test assert rmv
* add validation check for GDPO + sum_then_normalize
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-21 17:22:45 -05:00
Wing Lian
8623dd8a72
strip only starting 'v' char; e.g don't strip from '.dev' ( #3368 ) [skip ci]
2026-01-21 14:19:03 -05:00
Wing Lian
8cd75cff9f
use cuda 12.9.1 and add python 3.12 to base images ( #3367 )
2026-01-21 13:34:14 -05:00
Wing Lian
8ab9d9ea88
Version dev ( #3365 )
2026-01-20 22:58:29 -05:00
Wing Lian
6e42def14b
set version to v0.13.1 ( #3363 )
ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
v0.13.1
2026-01-20 08:58:32 -05:00
Wing Lian
c413480b35
upgrade transformers to 4.57.6 and peft to 0.17.1 and datasets to 4.5.0 ( #3361 )
2026-01-16 11:48:50 -05:00
Wing Lian
8f25124269
upgrade transformers to 4.57.5 ( #3358 )
...
* upgrade transformers to 4.57.5
* explicitly set versions for fbgemm-gpu
* handle index url for cuda version
* explicitly set cu version for fbgemm deps, skip for 130
* cu suffix not needed on version if using whl subpath
2026-01-16 11:17:43 -05:00
Wing Lian
790df757cb
don't install xformers in for arm64 ( #3359 )
...
* install xformers in the base docker image
* install numba and numpy first
* set CUDA_HOME for xformers install
* Set cuda home env
* don't install xformers by default on aarch64/arm64
2026-01-16 09:02:37 -05:00
Wing Lian
d282f32481
don't install deepspeed in arm64 images ( #3357 )
2026-01-14 12:03:55 -05:00
Wing Lian
6331e4a130
fix amd64 and set 2.9.1 as latest cloud image ( #3356 )
2026-01-14 11:56:36 -05:00
salman
1410e4474e
update PR template ( #3349 ) [skip ci]
2026-01-14 09:39:21 -05:00
Wing Lian
dc77b5bf42
fix arm64 builds ( #3355 )
...
* fix syntax for secrets in gha yaml
* setup env for uv too
* arm64 for base uv too
* don't build causal-conv1d or mamba for arm64 and use arm64 wheels
* fix dockerfile syntax
* fix shell syntax
2026-01-14 09:38:48 -05:00
NanoCode012
359b7ad85e
fix: gemma3_text model loading vision config ( #3354 )
...
* fix: gemma3-text mode loading vision config
* fix: improve defaults to use lora kernels
2026-01-13 09:49:23 -05:00
VED
258ce8d4fa
feat : scaled softmax support ( #3338 )
...
* scaled softmax
* comment
* lint
* remove egear
* validation for flash
* lint
* val imporve + neet
* fix correct softmax scale val(learned)
* learned scale val 4 ssm
* lint
* fix model_type rmv
* sdpa_atten
* test fix + lint
* test fix
* sdp_a val rmv
* flex fix
* main flash
* lint
* flex attn
* lint comment
* fix score_mod
* Update src/axolotl/utils/schemas/validation.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-01-13 14:33:11 +07:00
@TT
3e0bbd33ec
feat: add ARM64/AArch64 build support to Dockerfile-base ( #3346 )
...
* Add support for capability to build arm64 image
* Fixing wrong variable TARGETPLATFORM bug
* Adding missing semicolons
* skip docker hub login if PR (no push) or no credentials
* Enabling arm64 builds for Dockerfile-base in Github actions
* TARGETARCH automatically default to platform arch under build
* Enabling arm64 builds for axolotl docker builds
* Enabling arm64 builds for axolotl-cloud docker build Github actions
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-12 12:00:02 -05:00
salman
4ae6f766ad
bump bnb to v0.49.1 ( #3351 )
2026-01-12 09:42:04 -05:00
VED
e7f0d4ba5b
Increased test coverage for lora/qlora ( #3147 )
...
* config_val tests
* remove config val(not needed)
* config validation
* parameter freeze validation
* merge/unmerge tests
* removal unwanted
* rename
* lint
* updated lint
* Update tests/utils/lora/test_config_validation_lora.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* pytest skip + mock fix
* nitpicks
* revert some nitpicks
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-01-06 11:44:48 -05:00
VED
7bf6f70e96
fix total/trainable tokens log ( #3344 )
...
* fix total/trainable tokens log
* fix total/trainable tokens log
2026-01-06 09:25:17 -05:00
PraMamba
8aab807e67
feat: Add SwanLab integration for experiment tracking ( #3334 )
...
* feat(swanlab): add SwanLab integration for experiment tracking
SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training.
Features:
- Hyperparameter logging
- Training metrics tracking
- RLHF completion logging
- Performance profiling
- Configuration validation and conflict detection
Includes:
- Plugin in src/axolotl/integrations/swanlab/
- Callback in src/axolotl/utils/callbacks/swanlab.py
- Tests in tests/integrations/test_swanlab.py
- Examples in examples/swanlab/
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit
- Change use_swanlab default to True (winglian)
- Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major)
- Add safe exception handling in config fallback (CodeRabbit)
- Use context managers for file operations (CodeRabbit)
- Replace LOG.error with LOG.exception for better debugging (CodeRabbit)
- Sort __all__ alphabetically (CodeRabbit)
- Add language specifiers to README code blocks (CodeRabbit)
- Fix end-of-file newline in README (pre-commit)
Resolves actionable comments and nitpicks from CodeRabbit review.
Addresses reviewer feedback from @winglian.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* only run swanlab integration tests if package is available
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-06 09:19:18 -05:00
Wing Lian
ee59e4de97
add cu130 + torch 2.9.1 to test matrices ( #3343 )
...
* add cu130 + torch 2.9.1 to test matrices
* uv can't use pip3 directly
2026-01-05 15:24:29 -05:00
Wing Lian
4e61b8aa23
use updated version of prebuilt wheels for flash attention for cu130 ( #3342 )
...
* use updated version of prebuilt wheels for flash attention for cu130
* use elif
* fix the uv base installs of FA also
* make wget less verbose
2026-01-05 13:48:12 -05:00
Wing Lian
b26ba3a5cb
don't build images w cuda 130 since we don't have flash attention wheels ( #3341 )
2026-01-03 18:08:28 -05:00
Wing Lian
afe18ace35
deprecate torch 2.7.1 ( #3339 )
2026-01-01 06:52:45 -05:00
github-actions[bot]
2b199f9915
chore: update pre-commit hooks ( #3340 ) [skip ci]
...
Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com >
2026-01-01 06:52:28 -05:00
Wing Lian
e73dab6df9
support pydantic 2.12 ( #3328 )
...
* upgrade pydantic to 2.12
* use latest modal version
* upgrade modal
* update modal in requirements and loosen pydantic
* upgrade modal too
2025-12-30 12:41:07 -05:00
VED
f45a97a9ff
docs for checkpiont saving ( #3335 ) [skip ci]
...
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-30 12:40:32 -05:00
Wing Lian
11c0b5b256
bartch upgrade dependencies ( #3299 )
...
* upgrade dependencies
* don't use reset sessions
* downgrade transformers, upgrade other deps
* upgrade bnb to 0.49.0
* restore s3 cache
* explicit use local files w hub
* decompress and strip top level dir
* use 2 levels for strip components
* try to preserve permissions for symlinks
* use updated tar
* fix #3293 for distributed
* downgrade bnb
* fast fail after 4
* fix total tokens device
* patch accelerate CP/SP (#3309 )
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-30 09:02:49 -05:00
Wing Lian
66a3de3629
build examples readmes with quarto ( #3046 )
...
* build examples readmes with quarto
* chore: formatting
* feat: dynamic build docs
* feat: add more model guides
* chore: format
* fix: collapse sidebar completely to have space for model guides
* fix: security protection for generated qmd
* fix: adjust collapse level, add new models, update links
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-25 19:17:25 +07:00
VED
a6080df73c
compute loss only if training and update token metric naming ( #3293 ) [skip ci]
...
* compute loss only if training
* save total_tokens for checkpiont
* check if string
* refactor total_tokens/ num_tokens
* refactor 2
* rplc trainable_step/trian_per_sec_per_gpu
* lint + log trainable/tokens
* consolidate it in the callback.
* test for total_tokes aftr remuse
* check if tokenstate exist after ckpt
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-25 18:38:17 +07:00
NanoCode012
4f5e8a328a
Feat: add MiMo and Plano ( #3332 ) [skip-ci]
...
* feat: add xiaomi's mimo 7b
* fix: pin revision
* fix: update trinity docs and pin revision
* fix: wrong config name
* feat: add vram usage
* feat: add plano
* feat: update plano vram usage
* chore: comments
2025-12-25 18:09:03 +07:00
NanoCode012
418933f0d1
feat: add internvl3_5 ( #3141 ) [skip-ci]
...
* feat: add internvl3_5
* fix: add timm instructions
* chore: add kimi-linear to cce doc
* feat: update internvl example
* chore: pin revision
* chore: remove from multipack
* fix: add to multimodal array
* fix: internvl use hf version
* feat: update cce
* chore: lint
* fix: list for image_size
* chore: add docs vram usage
* feat: enable cce
* fix: no need trust remote code
* fix: inconsistent timm version
2025-12-25 18:07:59 +07:00
NanoCode012
372f664c63
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc ( #3330 ) [skip-ci]
...
* feat: add pos id to flex attention for packing part 1
* feat: update to include sliding window mask patch
* fix: suppress MatMul8bitLt: inputs will be cast from warnings
* fix: remove redundant flex attention patch
* chore: update olmo docs
* feat: add validator patch for cross entropy
2025-12-25 17:56:20 +07:00
NanoCode012
97f1b1758d
Feat: add kimi linear support ( #3257 )
...
* feat: add custom kimi linear patch [skip ci]
* feat: add configuration file and fix import [skip ci]
* fix: hijack tokenizer temporarily [skip ci]
* chore: remove accidental commit
* fix: attempt patch kimi remote
* fix: kwargs passsed
* fix: device for tensor
* fix: aux loss calculation
* feat: cleaned up patches order
* fix: remove duplicate tokenizer patch
* chore: add debug logs
* chore: add debug logs
* chore: debug
* Revert "chore: add debug logs"
This reverts commit da372a5f67 .
* Revert "chore: add debug logs"
This reverts commit 97d1de1d7c .
* fix: KeyError: 'tokenization_kimi'
* fix: support remote_model_id in cce patch
* feat: add config preload patch
* fix: use standard aux loss calc and updated modeling
* fix: import
* feat: add kimi-linear docs and example
* chore: add note about moe kernels
* feat: update cce to include kimi-linear
* chore: lint
* chore: update main readme
* fix: patch mechanism to address comments
* chore: lint
* fix: tests
* chore: cleanup comment
2025-12-25 17:53:52 +07:00
Abubakar Abid
f2155eaf79
feat: add trackio as experiment tracking integration ( #3253 )
...
* feat: add trackio as experiment tracking integration
- Add TrackioConfig to integrations schema with project_name, run_name, and space_id
- Create trackio_.py module for environment setup
- Add is_trackio_available() utility function
- Integrate trackio with report_to in trainer builder
- Add trackio callback for experiment tracking
- Add trackio config keys to gpt-oss example YAMLs
- Trackio runs locally by default, syncs to HF Space if space_id provided
* changes
* changes
* changes
* changes
* changes
* changes
* changes
* Update requirements.txt
* don't allow pydantic 2.12 for now
---------
Co-authored-by: Abubakar Abid <aaabid93@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-23 08:49:07 -05:00