Yohan Na
c6ddcdd06a
feat: add exaone4 chat template and update enums ( #3279 )
...
* feat: add exaone4 chat template and update enums
* fix: handle first message as system or tools in exaone4 chat template
* Update src/axolotl/utils/chat_templates/templates/exaone4.jinja
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* fix: lint
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-01 15:52:45 +07:00
github-actions[bot]
7fb6a947d9
chore: update pre-commit hooks ( #3287 )
...
Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com >
2025-12-01 15:03:14 +07:00
NanoCode012
b234532d9f
Feat: add peft_ensure_weight_tying ( #3278 )
...
* feat: upgrade peft to 0.18.0
* feat: add peft_ensure_weight_tying
* fix: default
* chore: adjust kwarg per feedback
2025-11-28 18:54:48 +07:00
VED
8990ca3205
fix: removed unused "scikit-learn==1.4.2" ( #3277 )
...
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-11-24 13:48:53 +07:00
NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
Wing Lian
0b635e69c5
build docker images for 2.9.x ( #3273 )
2025-11-20 09:26:24 -05:00
Wing Lian
0d27e14e45
Torch 2.9.1 base images ( #3268 )
...
* update torch 2.9.1 base images
* update base dockerfile image check
2025-11-20 09:04:37 -05:00
NanoCode012
f5f21fb216
chore: update readme with latest updates ( #3267 )
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, <nil>, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (vllm, 126, 12.6.3, true, 3.11, 2.7.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
v0.13.0
2025-11-18 14:45:21 +07:00
NanoCode012
4e55871112
feat: Add opt-out Telemetry ( #3237 )
...
* initial telemetry manager impl
* adding todo
* updates
* updates
* progress on telemetry: config load, process, model load, train start / end, error tracking
* update error file path sanitization function; adding more error tracking
* updated sanitization logic, tests
* adding runtime metrics (cpu + gpu memory, steps/s, etc.)
* tests for runtime metrics telemetry and assoc. callback
* small update / fix
* simplifying path redaction
* sleep on all ranks in distributed setting
* adding back in base_model redaction w/ whitelist
* fix
* doc update
* improved redaction, send system info during model config load telemetry, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* remove duplicate info
* fixes
* fix issue with tests in ci
* distributed fix
* opt-in version of telemetry
* enable / disable logic update
* docs fix
* doc update
* minor fixes
* simplifying
* slight changes
* fix
* lint
* update posthog dep
* coderabbit comments
* fix: opt-in model
* fix: increase time since last
* fix: increase whitelist orgs
* fix: posthog init and shutdown
* fix: imports
* fix: also check grad norm
* fix: duplicate plugin_manager calls
* fix: bad merge
* chore: update docs
* fix: cache process per comment
* fix: error handling
* fix: tests
* Revert "fix: error handling"
This reverts commit 22d1ea5755 .
* fix: test telemetry error_handled bool
* fix: revert test
* chore: final doc fixes
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com >
Co-authored-by: Dan Saunders <dan@axolotl.ai >
2025-11-18 11:35:25 +07:00
Wing Lian
a6bafb55cb
upgrade datasets to 4.4.1 ( #3266 )
...
* upgrade datasets
* cleanup pip cache earlier
* cleanup unused things from worker
* also cleanup sdist
2025-11-14 09:52:14 -08:00
Wing Lian
0fbde69e9c
only push axolotl images, personal repo is deprecated ( #3262 )
...
* only push axolotl images, personal repo is deprecated
* cleanup
2025-11-14 07:50:03 -08:00
Wing Lian
301e22849f
upgrade to latest deepspeed and make sure latest tagged axolotl images are using torch 2.8.0 ( #3261 )
2025-11-13 13:03:01 -05:00
VED
dcf24fd24e
feat: save checkpoint after training started ( #3233 )
...
* add:config parameters for checkpoint
* callback main
* test file_type fix
* lint
* unit
* simplify dict/obj handeling
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Delete tests/e2e/integrations/__init__.py
* remove hard code path in test
* device check
* lint
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* lint-2
* remove: singal based checkpoints
* lint
* remove signal tests
* add:is_main_process
* lint
* addis_d:istributed() for tests
* remove nested is_main_process
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* add user_defined_filename
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-11-13 10:21:05 -05:00
NanoCode012
49b8107989
feat: add granite4 examples ( #3256 ) [skip ci]
2025-11-13 10:19:16 -05:00
NanoCode012
9901ee5602
fix: voxtralprocessor broken ( #3255 ) [skip ci]
...
* fix: voxtralprocessor broken
* chore: add todo
* chore: wording
2025-11-13 10:18:42 -05:00
xzuyn
dd78f2e0cc
Fix: warmup_steps: 0 & warmup_ratio: 0 not disabling warmup ( #3254 )
...
* fix unintentional falsy checks
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 10:32:06 +07:00
Eduard Zl
b54f9c942b
_get_tools in ChatTemplateStrategy : function "parameters" can be dict or string ( #3238 )
...
* When training of function calls, "tools" elements of a dataset can contain same parameter name but with different types. Datasets fails to load such training set. This fix allows "parameters" element of function call to be string( by running "json.dumps" in preparation of training data set). The _get_tools function will iterate over tool definitions, if "parameters" element is dict, it will keep that way, if it is a string, it will be converted to dict by invoking "json.loads" on string value.
* feat: add doc on tool parameters json loading
* feat: add tests for parameters json string
---------
Co-authored-by: ezlotnik <eduard_zlotnik@intuit.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 09:04:28 +07:00
NanoCode012
11eb36585a
feat: add arg to enable dft in liger ( #3125 )
...
* feat: add arg to enable dft in liger
* feat: add tests use_token_scaling
* fix: test
* fix: move check to args
2025-11-10 21:37:47 +07:00
NanoCode012
d0c846fc5e
feat: add granitemoeshared and granitemoehybrid ( #3158 )
2025-11-10 21:35:45 +07:00
Wing Lian
b5fcc2f14b
log cumulative total trained tokens ( #3252 )
...
* log cumulative total trained tokens
* use is_distributed helper
2025-11-07 16:04:00 -05:00
Wing Lian
b62eed8809
add openenv-core to requirements ( #3251 )
2025-11-07 12:17:27 -05:00
VED
ed2e8cacd6
feat:openenv rollout_func ( #3239 ) [skip ci]
...
* feat:openenv rollout_func
* chore lint
* docs
* add:docs processing_class
* tests
* lint
2025-11-07 08:51:40 -05:00
Lê Nam Khánh
80270a92fa
Fix typos in some files ( #3250 ) [skip ci]
2025-11-07 08:21:20 -05:00
Wing Lian
bfdc9a8249
upgrade trl and other hf deps ( #3249 )
...
* upgrade trl and other hf deps
* skip simpo for now
2025-11-06 16:06:03 -05:00
salman
c37decb073
update pre-commit cadence ( #3245 )
2025-11-04 13:43:40 +00:00
NanoCode012
01a346d86a
feat(example): add gpt-oss-safeguard docs ( #3243 )
...
* feat(example): add gpt-oss-safeguard docs
* fix: add doc on reasoning_effort
2025-11-04 07:39:21 +07:00
NanoCode012
26f05b6008
fix(example): set model_type to load for gemma3 text ( #3242 )
...
* fix: set model_type to load for gemma3 text
* chore: simplify
* chore: unify
2025-11-04 07:35:07 +07:00
github-actions[bot]
ed58fa8a75
chore: update pre-commit hooks ( #3244 )
2025-11-03 15:55:40 +00:00
Wing Lian
633afffacb
add torch 2.9.0 to ci ( #3223 )
2025-10-30 18:50:26 -04:00
Wing Lian
4b1b4fa6d8
upgrade numpy ( #3236 )
...
* upgrade numpy to 2.3.4
* bump contribs for numpy
* fix vllm versions
* bump numba
* make sure psutil is installed
* add psutil to cicd dockerfile jinja
* lower dep versions of numba + numpy for vllm
* bump datasets version
* resolve pydantic conflict too
2025-10-30 10:03:24 -04:00
github-actions[bot]
0f7c886b7b
chore: update pre-commit hooks ( #3222 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-10-29 18:09:46 -04:00
Wing Lian
a4b921135b
build cuda 13.0.0 base image with 2.9.0 ( #3229 )
...
* build cuda 13.0.0 base image with 2.9.0
* upgrade causal-conv1d
* 1.5.4 not in pypi yet
* pin to 1.3.0
* use github release instead of pypi
* split the logic for incompatible packages
* fix bash in dockerfile
2025-10-29 18:07:29 -04:00
Wing Lian
98333e639a
upgrade trl to 0.24.0 and liger to 0.6.3 ( #3230 )
...
* upgrade trl to 0.24.0
* fix reward collator init
* use newer DataCollatorForPreference instead
* DataCollatorForPreference doesn't use padding kwarg
* fix input id labels
* fix fbgemm-gpu version for pytorch versions
* tweak pinned deps
* transformers doesn't support hub 1.0 yet
* upgrade liger dep to 0.6.3
* set TORCH_CUDA_ARCH_LIST correctly
2025-10-29 18:02:16 -04:00
Dan Saunders
9d4d39e939
Diffusion trainer fix: shift logits to align with input tokens ( #3191 )
...
* shift logits for diffusion generate
* delete unused
* diffusion trainer: token shift
2025-10-27 14:42:01 +07:00
Wing Lian
bb33fda44d
install flash attention in 2.9.0 base images ( #3224 )
2025-10-22 21:24:52 -07:00
VED
4dc018992d
Feat/opentelemetry ( #3215 )
2025-10-22 19:16:55 -07:00
NanoCode012
243620394a
fix: force train split for json,csv,txt for test_datasets and misc doc changes ( #3226 )
...
* fix: force train split for json,csv,txt for test_datasets
* feat(doc): add info on mixing datasets for VLM
* feat(doc): max memory
* fix(doc): clarify lr groups
* fix: add info on vision not being dropped
* feat: add qwen3-vl to multimodal docs
* fix: add moe blocks to arch list
* feat(doc): improve mistral docs
* chore: add helpful link [skip-e2e]
* fix: add vram usage for mistral small
* Update link in docs/faq.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-10-22 15:23:20 -07:00
Qingyang Wu
3750fdcf79
Fix trainer dataloader slow loading issue ( #3219 )
...
* Fix trainer dataloader handling in src/axolotl/core/trainers/base.py
* update comment to reflect torch version
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-10-22 21:22:14 +07:00
Matthew Hambrecht
613bcf90e5
fix: enable_sleep_mode -> vllm_enable_sleep_mode ( #3225 )
...
Co-authored-by: Matthew Hambrecht <matthew.hambrecht@patapsco.ai >
2025-10-22 06:55:26 -07:00
Wing Lian
383f220cfd
build torch 2.9.0 base images ( #3221 )
2025-10-20 08:53:49 -04:00
NanoCode012
8bb871b5cf
fix: deepspeed with context parallel ( #3220 )
2025-10-20 14:06:58 +07:00
Leonard
87565ecc05
Add chat_template.argilla_chat support for DPO datasets ( #3202 )
...
* Add chat_template.argilla_chat support for DPO datasets
Creates a new chat_template.argilla_chat prompt strategy for handling
DPO datasets where chosen/rejected fields contain full conversations
(messages + final response), following the pattern of chatml.argilla_chat
and llama3.argilla_chat.
- Add argilla_chat() function to chat_template.py
- Add chat_template.argilla_chat to RLHF documentation
- Add test coverage for argilla_chat with multiple tokenizers
Dataset format:
{
"chosen": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"rejected": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
]
}
* Fix chat_template.argilla_chat return value contract and add docstring
- Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn
- Add remove_columns specification for field_chosen and field_rejected
- Add comprehensive docstring with Args/Returns sections
- Update tests to unpack tuple return value
Addresses PR feedback to maintain consistency with chat_template.default()
and properly specify columns to remove after dataset transformation.
* Update tests/prompt_strategies/test_dpo_chat_templates.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-10-17 17:00:26 +07:00
NanoCode012
93ba57396f
fix: qwen3_vl attention config ( #3216 )
2025-10-17 10:35:03 +07:00
NanoCode012
aa1240acd8
fix: transformers deprecate load_in_Xbit in model_kwargs ( #3205 )
...
* fix: transformers deprecate load_in_Xbit in model_kwargs
* fix: test to read from quantization_config kwarg
* fix: test
* fix: access
* fix: test weirdly entering incorrect config
2025-10-16 16:07:27 +07:00
Wing Lian
4cdfdfebb5
upgrade transformers==4.57.1 and peft==0.23.1 ( #3214 )
2025-10-14 15:54:05 -04:00
github-actions[bot]
6e2f5ccf9f
chore: update pre-commit hooks ( #3211 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-10-14 10:21:49 -04:00
NanoCode012
8c7f63cf97
fix: unpack cce imported incorrectly ( #3212 ) [skip ci]
2025-10-13 17:19:15 +07:00
VED
cd856b45b1
feat:add support dataset_num_processes ( #3129 ) [skip ci]
...
* feat:add support dataset_num_processes
* chore
* required changes
* requested chnages
* required chnages
* required changes
* required changes
* elif get_default_process_count()
* add:del data
* Update cicd/Dockerfile.jinja
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update cicd/single_gpu.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2025-10-13 17:18:12 +07:00
salman
143dea4753
FSDPConfig (#3170 )
2025-10-10 14:44:25 +01:00
Hitesh Sagtani
bc2ffb8204
fix: Enable KD plugin support for PEFT/LoRA adapters ( #3207 )
...
- Fix _loss_function attribute not found on base model with PEFT
- Fix mismatched attribute name (loss_function vs _loss_function)
- Set _loss_function on unwrapped base model for PEFT
- Enable previously skipped test_llama_lora_kd test
- Add test config fixes for LoRA kernel compatibility
Fixes https://github.com/axolotl-ai-cloud/axolotl/issues/3206
2025-10-10 08:57:00 -04:00