Salman Mohammadi
0a0115493d
remove monkeypatch
2026-01-21 10:06:58 +00:00
Salman Mohammadi
7a4f33802d
try run regular CE loss on eval
2026-01-19 22:00:48 +00:00
Salman Mohammadi
170dca9bb9
WIP DFT
2026-01-15 18:43:31 +00:00
NanoCode012
359b7ad85e
fix: gemma3_text model loading vision config ( #3354 )
...
* fix: gemma3-text mode loading vision config
* fix: improve defaults to use lora kernels
2026-01-13 09:49:23 -05:00
VED
258ce8d4fa
feat : scaled softmax support ( #3338 )
...
* scaled softmax
* comment
* lint
* remove egear
* validation for flash
* lint
* val imporve + neet
* fix correct softmax scale val(learned)
* learned scale val 4 ssm
* lint
* fix model_type rmv
* sdpa_atten
* test fix + lint
* test fix
* sdp_a val rmv
* flex fix
* main flash
* lint
* flex attn
* lint comment
* fix score_mod
* Update src/axolotl/utils/schemas/validation.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-01-13 14:33:11 +07:00
VED
7bf6f70e96
fix total/trainable tokens log ( #3344 )
...
* fix total/trainable tokens log
* fix total/trainable tokens log
2026-01-06 09:25:17 -05:00
PraMamba
8aab807e67
feat: Add SwanLab integration for experiment tracking ( #3334 )
...
* feat(swanlab): add SwanLab integration for experiment tracking
SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training.
Features:
- Hyperparameter logging
- Training metrics tracking
- RLHF completion logging
- Performance profiling
- Configuration validation and conflict detection
Includes:
- Plugin in src/axolotl/integrations/swanlab/
- Callback in src/axolotl/utils/callbacks/swanlab.py
- Tests in tests/integrations/test_swanlab.py
- Examples in examples/swanlab/
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit
- Change use_swanlab default to True (winglian)
- Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major)
- Add safe exception handling in config fallback (CodeRabbit)
- Use context managers for file operations (CodeRabbit)
- Replace LOG.error with LOG.exception for better debugging (CodeRabbit)
- Sort __all__ alphabetically (CodeRabbit)
- Add language specifiers to README code blocks (CodeRabbit)
- Fix end-of-file newline in README (pre-commit)
Resolves actionable comments and nitpicks from CodeRabbit review.
Addresses reviewer feedback from @winglian.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* only run swanlab integration tests if package is available
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-06 09:19:18 -05:00
Wing Lian
afe18ace35
deprecate torch 2.7.1 ( #3339 )
2026-01-01 06:52:45 -05:00
Wing Lian
11c0b5b256
bartch upgrade dependencies ( #3299 )
...
* upgrade dependencies
* don't use reset sessions
* downgrade transformers, upgrade other deps
* upgrade bnb to 0.49.0
* restore s3 cache
* explicit use local files w hub
* decompress and strip top level dir
* use 2 levels for strip components
* try to preserve permissions for symlinks
* use updated tar
* fix #3293 for distributed
* downgrade bnb
* fast fail after 4
* fix total tokens device
* patch accelerate CP/SP (#3309 )
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-30 09:02:49 -05:00
VED
a6080df73c
compute loss only if training and update token metric naming ( #3293 ) [skip ci]
...
* compute loss only if training
* save total_tokens for checkpiont
* check if string
* refactor total_tokens/ num_tokens
* refactor 2
* rplc trainable_step/trian_per_sec_per_gpu
* lint + log trainable/tokens
* consolidate it in the callback.
* test for total_tokes aftr remuse
* check if tokenstate exist after ckpt
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-25 18:38:17 +07:00
NanoCode012
418933f0d1
feat: add internvl3_5 ( #3141 ) [skip-ci]
...
* feat: add internvl3_5
* fix: add timm instructions
* chore: add kimi-linear to cce doc
* feat: update internvl example
* chore: pin revision
* chore: remove from multipack
* fix: add to multimodal array
* fix: internvl use hf version
* feat: update cce
* chore: lint
* fix: list for image_size
* chore: add docs vram usage
* feat: enable cce
* fix: no need trust remote code
* fix: inconsistent timm version
2025-12-25 18:07:59 +07:00
NanoCode012
372f664c63
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc ( #3330 ) [skip-ci]
...
* feat: add pos id to flex attention for packing part 1
* feat: update to include sliding window mask patch
* fix: suppress MatMul8bitLt: inputs will be cast from warnings
* fix: remove redundant flex attention patch
* chore: update olmo docs
* feat: add validator patch for cross entropy
2025-12-25 17:56:20 +07:00
NanoCode012
97f1b1758d
Feat: add kimi linear support ( #3257 )
...
* feat: add custom kimi linear patch [skip ci]
* feat: add configuration file and fix import [skip ci]
* fix: hijack tokenizer temporarily [skip ci]
* chore: remove accidental commit
* fix: attempt patch kimi remote
* fix: kwargs passsed
* fix: device for tensor
* fix: aux loss calculation
* feat: cleaned up patches order
* fix: remove duplicate tokenizer patch
* chore: add debug logs
* chore: add debug logs
* chore: debug
* Revert "chore: add debug logs"
This reverts commit da372a5f67 .
* Revert "chore: add debug logs"
This reverts commit 97d1de1d7c .
* fix: KeyError: 'tokenization_kimi'
* fix: support remote_model_id in cce patch
* feat: add config preload patch
* fix: use standard aux loss calc and updated modeling
* fix: import
* feat: add kimi-linear docs and example
* chore: add note about moe kernels
* feat: update cce to include kimi-linear
* chore: lint
* chore: update main readme
* fix: patch mechanism to address comments
* chore: lint
* fix: tests
* chore: cleanup comment
2025-12-25 17:53:52 +07:00
Abubakar Abid
f2155eaf79
feat: add trackio as experiment tracking integration ( #3253 )
...
* feat: add trackio as experiment tracking integration
- Add TrackioConfig to integrations schema with project_name, run_name, and space_id
- Create trackio_.py module for environment setup
- Add is_trackio_available() utility function
- Integrate trackio with report_to in trainer builder
- Add trackio callback for experiment tracking
- Add trackio config keys to gpt-oss example YAMLs
- Trackio runs locally by default, syncs to HF Space if space_id provided
* changes
* changes
* changes
* changes
* changes
* changes
* changes
* Update requirements.txt
* don't allow pydantic 2.12 for now
---------
Co-authored-by: Abubakar Abid <aaabid93@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-23 08:49:07 -05:00
kallewoof
92ee4256f7
feature: raise on long sequence drop ( #3321 )
...
* feature: raise on long sequence drop
It is sometimes not desired that sequences are silently dropped from the dataset, especially when the dataset has been carefully crafted and pre-fitted for the training context. This would then suggest that an error occurred somewhere in the process. This feature adds a third value for excess_length_strategy called 'raise', which will raise a ValueError if a sequence is encountered that is too long and would have normally been dropped/truncated.
* tests: add excess_length_strategy tests
* doc: updated return value description for drop_long_seq_in_dataset
* add @enable_hf_offline
* fixed cfg modified after validate_config called
* hf offline fix
* fix tqdm desc when raise is used
* test: added test for non-batched case
* accidental code change revert
* test: use pytest.raises
* test: simplified drop_seq_len tests
* test: moved excess_length_strat test to test_data.py
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-22 13:59:49 -05:00
Wing Lian
efeb5a4e41
fix check for fp8 capability ( #3324 )
...
* fix check for fp8 capability
* handle non-cuda compute
* reduce concurrency of tests
2025-12-22 13:58:25 -05:00
VED
faaff6c792
allow users to set ndigits for rounding of metrics when logging ( #3325 )
...
* METRIC_PRECISION-> 8
* use ndigits and move env getter to top of log function
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-22 08:54:43 -05:00
Alexander Kozhevnikov
43cef27458
Fix typo in densemixer RuntimeError ( #3327 ) [skip ci]
...
It offers installing densemizer while it should be densemixer
2025-12-22 08:53:58 -05:00
salman
bbd3486f57
Distributed Muon Optimizer ( #3264 )
...
* init
* working
* updating configs
* removing unneeded files
* lint
* comments
* lint
* fix regex match
* bump contribs version
* comments
* fixing tests and imports
* muon imports in test v2
* test cleanup
* bump contribs version
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2025-12-19 10:43:47 -05:00
VED
3750d7dd64
add liger support kernal for dpo ( #3302 )
...
* add liger kernal 4 dpo
* revert grpo changes,add support in dpo
* revert grpo changes,add support in dpo
* dpo_use_liger_kernal
* fix liger_dpo
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-18 11:11:06 -05:00
xzuyn
2197b0bf89
feat: cheap ppl metric ( #3317 )
...
* Import math and compute perplexity from loss values
* lint
* coderabbit changes
* lint
* fix: add rounding to ppl
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-18 09:02:41 -05:00
Seung Hyun Cho
3e51a680c2
fix: Fix evaluation loss in KD trainer ( #3271 )
...
* fix: Fix evaluation loss in KD trainer
* Fix v2 strategy super() call
* fix: Add safety check for total_tokens in log method
* fix: simplified num items and outputs return handling
* fix: add missing model forward pass in compute_loss
* refactor: Use Template Method pattern for chat template strategies
* refactor: use pop(None) and remove v2 override
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 13:40:36 -05:00
xzuyn
2cf254b4af
Add peft_autocast_adapter_dtype config option ( #3311 ) [skip ci]
...
* Add `peft_autocast_adapter_dtype` field to schema
* Add `autocast_adapter_dtype` to `model_kwargs`
* chore: docs
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-17 10:09:39 -05:00
NanoCode012
a1d07f42e4
Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate ( #3313 )
...
* fix: leftover ministral docs changes
* fix: pytorch_cuda_alloc_conf deprecation
* fix: set old PYTORCH_CUDA_ALLOC_CONF env too
* handle 2.9 separately
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 09:12:18 -05:00
Wing Lian
2a664dc8ad
support for xformers wheels for torch 2.9 ( #3308 )
...
* support for xformers wheels for torch 2.9
* fix hf cache?
* don't use hf cache from s3
* show disk free space in ci
2025-12-11 11:56:40 -05:00
NanoCode012
4ac78aa562
fix: update qwen3 jinja tokenization off a few tokens ( #3295 )
...
* fix: update qwen3 jinja tokenization off a few tokens
* fix: add note on tokenization issue
* fix: pop last index for mistral tokenizer
2025-12-09 14:31:03 +07:00
VED
b3f4aa149f
fix bin size ( #3307 )
...
* fix bin size
* lint
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-08 09:16:18 -05:00
salman
75b20fb66f
Save processor in quantizer CLI ( #3290 )
2025-12-06 16:27:18 +00:00
NanoCode012
2b66ee189c
Feat: add ministral3 ( #3297 )
...
* feat: add ministral and mistral3
* chore: lint
* feat: update cce for ministral
* fix: add vram usage
* feat: update for release
* fix: save_pretrained issue in v5
* fix: add instructions to use v5 branch
* fix: add to multipack
* fix: improve instructions
* fix: add model to readme
2025-12-04 08:32:08 -05:00
NanoCode012
86d8cca149
Feat: add trinity by ArceeAI ( #3292 )
2025-12-02 13:12:55 -05:00
Yohan Na
c6ddcdd06a
feat: add exaone4 chat template and update enums ( #3279 )
...
* feat: add exaone4 chat template and update enums
* fix: handle first message as system or tools in exaone4 chat template
* Update src/axolotl/utils/chat_templates/templates/exaone4.jinja
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* fix: lint
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-01 15:52:45 +07:00
NanoCode012
b234532d9f
Feat: add peft_ensure_weight_tying ( #3278 )
...
* feat: upgrade peft to 0.18.0
* feat: add peft_ensure_weight_tying
* fix: default
* chore: adjust kwarg per feedback
2025-11-28 18:54:48 +07:00
NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
NanoCode012
4e55871112
feat: Add opt-out Telemetry ( #3237 )
...
* initial telemetry manager impl
* adding todo
* updates
* updates
* progress on telemetry: config load, process, model load, train start / end, error tracking
* update error file path sanitization function; adding more error tracking
* updated sanitization logic, tests
* adding runtime metrics (cpu + gpu memory, steps/s, etc.)
* tests for runtime metrics telemetry and assoc. callback
* small update / fix
* simplifying path redaction
* sleep on all ranks in distributed setting
* adding back in base_model redaction w/ whitelist
* fix
* doc update
* improved redaction, send system info during model config load telemetry, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* remove duplicate info
* fixes
* fix issue with tests in ci
* distributed fix
* opt-in version of telemetry
* enable / disable logic update
* docs fix
* doc update
* minor fixes
* simplifying
* slight changes
* fix
* lint
* update posthog dep
* coderabbit comments
* fix: opt-in model
* fix: increase time since last
* fix: increase whitelist orgs
* fix: posthog init and shutdown
* fix: imports
* fix: also check grad norm
* fix: duplicate plugin_manager calls
* fix: bad merge
* chore: update docs
* fix: cache process per comment
* fix: error handling
* fix: tests
* Revert "fix: error handling"
This reverts commit 22d1ea5755 .
* fix: test telemetry error_handled bool
* fix: revert test
* chore: final doc fixes
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com >
Co-authored-by: Dan Saunders <dan@axolotl.ai >
2025-11-18 11:35:25 +07:00
VED
dcf24fd24e
feat: save checkpoint after training started ( #3233 )
...
* add:config parameters for checkpoint
* callback main
* test file_type fix
* lint
* unit
* simplify dict/obj handeling
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Delete tests/e2e/integrations/__init__.py
* remove hard code path in test
* device check
* lint
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* lint-2
* remove: singal based checkpoints
* lint
* remove signal tests
* add:is_main_process
* lint
* addis_d:istributed() for tests
* remove nested is_main_process
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* add user_defined_filename
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-11-13 10:21:05 -05:00
NanoCode012
9901ee5602
fix: voxtralprocessor broken ( #3255 ) [skip ci]
...
* fix: voxtralprocessor broken
* chore: add todo
* chore: wording
2025-11-13 10:18:42 -05:00
xzuyn
dd78f2e0cc
Fix: warmup_steps: 0 & warmup_ratio: 0 not disabling warmup ( #3254 )
...
* fix unintentional falsy checks
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 10:32:06 +07:00
Eduard Zl
b54f9c942b
_get_tools in ChatTemplateStrategy : function "parameters" can be dict or string ( #3238 )
...
* When training of function calls, "tools" elements of a dataset can contain same parameter name but with different types. Datasets fails to load such training set. This fix allows "parameters" element of function call to be string( by running "json.dumps" in preparation of training data set). The _get_tools function will iterate over tool definitions, if "parameters" element is dict, it will keep that way, if it is a string, it will be converted to dict by invoking "json.loads" on string value.
* feat: add doc on tool parameters json loading
* feat: add tests for parameters json string
---------
Co-authored-by: ezlotnik <eduard_zlotnik@intuit.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 09:04:28 +07:00
NanoCode012
11eb36585a
feat: add arg to enable dft in liger ( #3125 )
...
* feat: add arg to enable dft in liger
* feat: add tests use_token_scaling
* fix: test
* fix: move check to args
2025-11-10 21:37:47 +07:00
NanoCode012
d0c846fc5e
feat: add granitemoeshared and granitemoehybrid ( #3158 )
2025-11-10 21:35:45 +07:00
Wing Lian
b5fcc2f14b
log cumulative total trained tokens ( #3252 )
...
* log cumulative total trained tokens
* use is_distributed helper
2025-11-07 16:04:00 -05:00
VED
ed2e8cacd6
feat:openenv rollout_func ( #3239 ) [skip ci]
...
* feat:openenv rollout_func
* chore lint
* docs
* add:docs processing_class
* tests
* lint
2025-11-07 08:51:40 -05:00
Lê Nam Khánh
80270a92fa
Fix typos in some files ( #3250 ) [skip ci]
2025-11-07 08:21:20 -05:00
Wing Lian
98333e639a
upgrade trl to 0.24.0 and liger to 0.6.3 ( #3230 )
...
* upgrade trl to 0.24.0
* fix reward collator init
* use newer DataCollatorForPreference instead
* DataCollatorForPreference doesn't use padding kwarg
* fix input id labels
* fix fbgemm-gpu version for pytorch versions
* tweak pinned deps
* transformers doesn't support hub 1.0 yet
* upgrade liger dep to 0.6.3
* set TORCH_CUDA_ARCH_LIST correctly
2025-10-29 18:02:16 -04:00
Dan Saunders
9d4d39e939
Diffusion trainer fix: shift logits to align with input tokens ( #3191 )
...
* shift logits for diffusion generate
* delete unused
* diffusion trainer: token shift
2025-10-27 14:42:01 +07:00
VED
4dc018992d
Feat/opentelemetry ( #3215 )
2025-10-22 19:16:55 -07:00
NanoCode012
243620394a
fix: force train split for json,csv,txt for test_datasets and misc doc changes ( #3226 )
...
* fix: force train split for json,csv,txt for test_datasets
* feat(doc): add info on mixing datasets for VLM
* feat(doc): max memory
* fix(doc): clarify lr groups
* fix: add info on vision not being dropped
* feat: add qwen3-vl to multimodal docs
* fix: add moe blocks to arch list
* feat(doc): improve mistral docs
* chore: add helpful link [skip-e2e]
* fix: add vram usage for mistral small
* Update link in docs/faq.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-10-22 15:23:20 -07:00
Qingyang Wu
3750fdcf79
Fix trainer dataloader slow loading issue ( #3219 )
...
* Fix trainer dataloader handling in src/axolotl/core/trainers/base.py
* update comment to reflect torch version
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-10-22 21:22:14 +07:00
Matthew Hambrecht
613bcf90e5
fix: enable_sleep_mode -> vllm_enable_sleep_mode ( #3225 )
...
Co-authored-by: Matthew Hambrecht <matthew.hambrecht@patapsco.ai >
2025-10-22 06:55:26 -07:00
NanoCode012
8bb871b5cf
fix: deepspeed with context parallel ( #3220 )
2025-10-20 14:06:58 +07:00