Wing Lian
efeb5a4e41
fix check for fp8 capability ( #3324 )
...
* fix check for fp8 capability
* handle non-cuda compute
* reduce concurrency of tests
2025-12-22 13:58:25 -05:00
VED
faaff6c792
allow users to set ndigits for rounding of metrics when logging ( #3325 )
...
* METRIC_PRECISION-> 8
* use ndigits and move env getter to top of log function
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-22 08:54:43 -05:00
Alexander Kozhevnikov
43cef27458
Fix typo in densemixer RuntimeError ( #3327 ) [skip ci]
...
It offers installing densemizer while it should be densemixer
2025-12-22 08:53:58 -05:00
Wing Lian
07c41a6c2a
fix preview docs failing due to running out of disk ( #3326 ) [skip ci]
...
* fix preview docs failing due to running out of disk
* fix docs publish too
2025-12-19 11:34:55 -05:00
salman
bbd3486f57
Distributed Muon Optimizer ( #3264 )
...
* init
* working
* updating configs
* removing unneeded files
* lint
* comments
* lint
* fix regex match
* bump contribs version
* comments
* fixing tests and imports
* muon imports in test v2
* test cleanup
* bump contribs version
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2025-12-19 10:43:47 -05:00
VED
3750d7dd64
add liger support kernal for dpo ( #3302 )
...
* add liger kernal 4 dpo
* revert grpo changes,add support in dpo
* revert grpo changes,add support in dpo
* dpo_use_liger_kernal
* fix liger_dpo
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-18 11:11:06 -05:00
xzuyn
2197b0bf89
feat: cheap ppl metric ( #3317 )
...
* Import math and compute perplexity from loss values
* lint
* coderabbit changes
* lint
* fix: add rounding to ppl
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-18 09:02:41 -05:00
Seung Hyun Cho
3e51a680c2
fix: Fix evaluation loss in KD trainer ( #3271 )
...
* fix: Fix evaluation loss in KD trainer
* Fix v2 strategy super() call
* fix: Add safety check for total_tokens in log method
* fix: simplified num items and outputs return handling
* fix: add missing model forward pass in compute_loss
* refactor: Use Template Method pattern for chat template strategies
* refactor: use pop(None) and remove v2 override
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 13:40:36 -05:00
xzuyn
2cf254b4af
Add peft_autocast_adapter_dtype config option ( #3311 ) [skip ci]
...
* Add `peft_autocast_adapter_dtype` field to schema
* Add `autocast_adapter_dtype` to `model_kwargs`
* chore: docs
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-17 10:09:39 -05:00
salman
83d4d97dcc
Add QAT NVFP4 configs for blogpost ( #3280 ) [skip ci]
...
* add configs for blogpost
* fix configs
* fixing baseline configs
2025-12-17 09:35:22 -05:00
NanoCode012
a1d07f42e4
Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate ( #3313 )
...
* fix: leftover ministral docs changes
* fix: pytorch_cuda_alloc_conf deprecation
* fix: set old PYTORCH_CUDA_ALLOC_CONF env too
* handle 2.9 separately
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 09:12:18 -05:00
Wing Lian
2a664dc8ad
support for xformers wheels for torch 2.9 ( #3308 )
...
* support for xformers wheels for torch 2.9
* fix hf cache?
* don't use hf cache from s3
* show disk free space in ci
2025-12-11 11:56:40 -05:00
NanoCode012
4ac78aa562
fix: update qwen3 jinja tokenization off a few tokens ( #3295 )
...
* fix: update qwen3 jinja tokenization off a few tokens
* fix: add note on tokenization issue
* fix: pop last index for mistral tokenizer
2025-12-09 14:31:03 +07:00
VED
b3f4aa149f
fix bin size ( #3307 )
...
* fix bin size
* lint
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-08 09:16:18 -05:00
salman
75b20fb66f
Save processor in quantizer CLI ( #3290 )
2025-12-06 16:27:18 +00:00
NanoCode012
5992e607a2
fix: improve ministral3 docs to be clearer ( #3300 )
...
* fix: improve ministral3 docs to be clearer
* fix: title
* chore: wording
2025-12-04 21:44:44 +07:00
NanoCode012
2b66ee189c
Feat: add ministral3 ( #3297 )
...
* feat: add ministral and mistral3
* chore: lint
* feat: update cce for ministral
* fix: add vram usage
* feat: update for release
* fix: save_pretrained issue in v5
* fix: add instructions to use v5 branch
* fix: add to multipack
* fix: improve instructions
* fix: add model to readme
2025-12-04 08:32:08 -05:00
NanoCode012
86d8cca149
Feat: add trinity by ArceeAI ( #3292 )
2025-12-02 13:12:55 -05:00
NanoCode012
4a0f98e612
feat: upgrade liger to 0.6.4 ( #3289 )
2025-12-02 09:16:23 -05:00
Yohan Na
c6ddcdd06a
feat: add exaone4 chat template and update enums ( #3279 )
...
* feat: add exaone4 chat template and update enums
* fix: handle first message as system or tools in exaone4 chat template
* Update src/axolotl/utils/chat_templates/templates/exaone4.jinja
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* fix: lint
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-01 15:52:45 +07:00
github-actions[bot]
7fb6a947d9
chore: update pre-commit hooks ( #3287 )
...
Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com >
2025-12-01 15:03:14 +07:00
NanoCode012
b234532d9f
Feat: add peft_ensure_weight_tying ( #3278 )
...
* feat: upgrade peft to 0.18.0
* feat: add peft_ensure_weight_tying
* fix: default
* chore: adjust kwarg per feedback
2025-11-28 18:54:48 +07:00
VED
8990ca3205
fix: removed unused "scikit-learn==1.4.2" ( #3277 )
...
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-11-24 13:48:53 +07:00
NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
Wing Lian
0b635e69c5
build docker images for 2.9.x ( #3273 )
2025-11-20 09:26:24 -05:00
Wing Lian
0d27e14e45
Torch 2.9.1 base images ( #3268 )
...
* update torch 2.9.1 base images
* update base dockerfile image check
2025-11-20 09:04:37 -05:00
NanoCode012
f5f21fb216
chore: update readme with latest updates ( #3267 )
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, <nil>, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (vllm, 126, 12.6.3, true, 3.11, 2.7.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
v0.13.0
2025-11-18 14:45:21 +07:00
NanoCode012
4e55871112
feat: Add opt-out Telemetry ( #3237 )
...
* initial telemetry manager impl
* adding todo
* updates
* updates
* progress on telemetry: config load, process, model load, train start / end, error tracking
* update error file path sanitization function; adding more error tracking
* updated sanitization logic, tests
* adding runtime metrics (cpu + gpu memory, steps/s, etc.)
* tests for runtime metrics telemetry and assoc. callback
* small update / fix
* simplifying path redaction
* sleep on all ranks in distributed setting
* adding back in base_model redaction w/ whitelist
* fix
* doc update
* improved redaction, send system info during model config load telemetry, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* remove duplicate info
* fixes
* fix issue with tests in ci
* distributed fix
* opt-in version of telemetry
* enable / disable logic update
* docs fix
* doc update
* minor fixes
* simplifying
* slight changes
* fix
* lint
* update posthog dep
* coderabbit comments
* fix: opt-in model
* fix: increase time since last
* fix: increase whitelist orgs
* fix: posthog init and shutdown
* fix: imports
* fix: also check grad norm
* fix: duplicate plugin_manager calls
* fix: bad merge
* chore: update docs
* fix: cache process per comment
* fix: error handling
* fix: tests
* Revert "fix: error handling"
This reverts commit 22d1ea5755 .
* fix: test telemetry error_handled bool
* fix: revert test
* chore: final doc fixes
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com >
Co-authored-by: Dan Saunders <dan@axolotl.ai >
2025-11-18 11:35:25 +07:00
Wing Lian
a6bafb55cb
upgrade datasets to 4.4.1 ( #3266 )
...
* upgrade datasets
* cleanup pip cache earlier
* cleanup unused things from worker
* also cleanup sdist
2025-11-14 09:52:14 -08:00
Wing Lian
0fbde69e9c
only push axolotl images, personal repo is deprecated ( #3262 )
...
* only push axolotl images, personal repo is deprecated
* cleanup
2025-11-14 07:50:03 -08:00
Wing Lian
301e22849f
upgrade to latest deepspeed and make sure latest tagged axolotl images are using torch 2.8.0 ( #3261 )
2025-11-13 13:03:01 -05:00
VED
dcf24fd24e
feat: save checkpoint after training started ( #3233 )
...
* add:config parameters for checkpoint
* callback main
* test file_type fix
* lint
* unit
* simplify dict/obj handeling
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Delete tests/e2e/integrations/__init__.py
* remove hard code path in test
* device check
* lint
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* lint-2
* remove: singal based checkpoints
* lint
* remove signal tests
* add:is_main_process
* lint
* addis_d:istributed() for tests
* remove nested is_main_process
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* add user_defined_filename
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-11-13 10:21:05 -05:00
NanoCode012
49b8107989
feat: add granite4 examples ( #3256 ) [skip ci]
2025-11-13 10:19:16 -05:00
NanoCode012
9901ee5602
fix: voxtralprocessor broken ( #3255 ) [skip ci]
...
* fix: voxtralprocessor broken
* chore: add todo
* chore: wording
2025-11-13 10:18:42 -05:00
xzuyn
dd78f2e0cc
Fix: warmup_steps: 0 & warmup_ratio: 0 not disabling warmup ( #3254 )
...
* fix unintentional falsy checks
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 10:32:06 +07:00
Eduard Zl
b54f9c942b
_get_tools in ChatTemplateStrategy : function "parameters" can be dict or string ( #3238 )
...
* When training of function calls, "tools" elements of a dataset can contain same parameter name but with different types. Datasets fails to load such training set. This fix allows "parameters" element of function call to be string( by running "json.dumps" in preparation of training data set). The _get_tools function will iterate over tool definitions, if "parameters" element is dict, it will keep that way, if it is a string, it will be converted to dict by invoking "json.loads" on string value.
* feat: add doc on tool parameters json loading
* feat: add tests for parameters json string
---------
Co-authored-by: ezlotnik <eduard_zlotnik@intuit.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 09:04:28 +07:00
NanoCode012
11eb36585a
feat: add arg to enable dft in liger ( #3125 )
...
* feat: add arg to enable dft in liger
* feat: add tests use_token_scaling
* fix: test
* fix: move check to args
2025-11-10 21:37:47 +07:00
NanoCode012
d0c846fc5e
feat: add granitemoeshared and granitemoehybrid ( #3158 )
2025-11-10 21:35:45 +07:00
Wing Lian
b5fcc2f14b
log cumulative total trained tokens ( #3252 )
...
* log cumulative total trained tokens
* use is_distributed helper
2025-11-07 16:04:00 -05:00
Wing Lian
b62eed8809
add openenv-core to requirements ( #3251 )
2025-11-07 12:17:27 -05:00
VED
ed2e8cacd6
feat:openenv rollout_func ( #3239 ) [skip ci]
...
* feat:openenv rollout_func
* chore lint
* docs
* add:docs processing_class
* tests
* lint
2025-11-07 08:51:40 -05:00
Lê Nam Khánh
80270a92fa
Fix typos in some files ( #3250 ) [skip ci]
2025-11-07 08:21:20 -05:00
Wing Lian
bfdc9a8249
upgrade trl and other hf deps ( #3249 )
...
* upgrade trl and other hf deps
* skip simpo for now
2025-11-06 16:06:03 -05:00
salman
c37decb073
update pre-commit cadence ( #3245 )
2025-11-04 13:43:40 +00:00
NanoCode012
01a346d86a
feat(example): add gpt-oss-safeguard docs ( #3243 )
...
* feat(example): add gpt-oss-safeguard docs
* fix: add doc on reasoning_effort
2025-11-04 07:39:21 +07:00
NanoCode012
26f05b6008
fix(example): set model_type to load for gemma3 text ( #3242 )
...
* fix: set model_type to load for gemma3 text
* chore: simplify
* chore: unify
2025-11-04 07:35:07 +07:00
github-actions[bot]
ed58fa8a75
chore: update pre-commit hooks ( #3244 )
2025-11-03 15:55:40 +00:00
Wing Lian
633afffacb
add torch 2.9.0 to ci ( #3223 )
2025-10-30 18:50:26 -04:00
Wing Lian
4b1b4fa6d8
upgrade numpy ( #3236 )
...
* upgrade numpy to 2.3.4
* bump contribs for numpy
* fix vllm versions
* bump numba
* make sure psutil is installed
* add psutil to cicd dockerfile jinja
* lower dep versions of numba + numpy for vllm
* bump datasets version
* resolve pydantic conflict too
2025-10-30 10:03:24 -04:00
github-actions[bot]
0f7c886b7b
chore: update pre-commit hooks ( #3222 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-10-29 18:09:46 -04:00