axolotl

Author	SHA1	Message	Date
Wing Lian	ee59e4de97	add cu130 + torch 2.9.1 to test matrices (#3343 ) * add cu130 + torch 2.9.1 to test matrices * uv can't use pip3 directly	2026-01-05 15:24:29 -05:00
Wing Lian	4e61b8aa23	use updated version of prebuilt wheels for flash attention for cu130 (#3342 ) * use updated version of prebuilt wheels for flash attention for cu130 * use elif * fix the uv base installs of FA also * make wget less verbose	2026-01-05 13:48:12 -05:00
Wing Lian	b26ba3a5cb	don't build images w cuda 130 since we don't have flash attention wheels (#3341 )	2026-01-03 18:08:28 -05:00
Wing Lian	afe18ace35	deprecate torch 2.7.1 (#3339 )	2026-01-01 06:52:45 -05:00
github-actions[bot]	2b199f9915	chore: update pre-commit hooks (#3340 ) [skip ci] Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com>	2026-01-01 06:52:28 -05:00
Wing Lian	e73dab6df9	support pydantic 2.12 (#3328 ) * upgrade pydantic to 2.12 * use latest modal version * upgrade modal * update modal in requirements and loosen pydantic * upgrade modal too	2025-12-30 12:41:07 -05:00
VED	f45a97a9ff	docs for checkpiont saving (#3335 ) [skip ci] Co-authored-by: Ved <ved.work2024@gmail.com>	2025-12-30 12:40:32 -05:00
Wing Lian	11c0b5b256	bartch upgrade dependencies (#3299 ) * upgrade dependencies * don't use reset sessions * downgrade transformers, upgrade other deps * upgrade bnb to 0.49.0 * restore s3 cache * explicit use local files w hub * decompress and strip top level dir * use 2 levels for strip components * try to preserve permissions for symlinks * use updated tar * fix #3293 for distributed * downgrade bnb * fast fail after 4 * fix total tokens device * patch accelerate CP/SP (#3309) --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-12-30 09:02:49 -05:00
Wing Lian	66a3de3629	build examples readmes with quarto (#3046 ) * build examples readmes with quarto * chore: formatting * feat: dynamic build docs * feat: add more model guides * chore: format * fix: collapse sidebar completely to have space for model guides * fix: security protection for generated qmd * fix: adjust collapse level, add new models, update links --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-12-25 19:17:25 +07:00
VED	a6080df73c	compute loss only if training and update token metric naming (#3293 ) [skip ci] * compute loss only if training * save total_tokens for checkpiont * check if string * refactor total_tokens/ num_tokens * refactor 2 * rplc trainable_step/trian_per_sec_per_gpu * lint + log trainable/tokens * consolidate it in the callback. * test for total_tokes aftr remuse * check if tokenstate exist after ckpt --------- Co-authored-by: Ved <ved.work2024@gmail.com>	2025-12-25 18:38:17 +07:00
NanoCode012	4f5e8a328a	Feat: add MiMo and Plano (#3332 ) [skip-ci] * feat: add xiaomi's mimo 7b * fix: pin revision * fix: update trinity docs and pin revision * fix: wrong config name * feat: add vram usage * feat: add plano * feat: update plano vram usage * chore: comments	2025-12-25 18:09:03 +07:00
NanoCode012	418933f0d1	feat: add internvl3_5 (#3141 ) [skip-ci] * feat: add internvl3_5 * fix: add timm instructions * chore: add kimi-linear to cce doc * feat: update internvl example * chore: pin revision * chore: remove from multipack * fix: add to multimodal array * fix: internvl use hf version * feat: update cce * chore: lint * fix: list for image_size * chore: add docs vram usage * feat: enable cce * fix: no need trust remote code * fix: inconsistent timm version	2025-12-25 18:07:59 +07:00
NanoCode012	372f664c63	feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc (#3330 ) [skip-ci] * feat: add pos id to flex attention for packing part 1 * feat: update to include sliding window mask patch * fix: suppress MatMul8bitLt: inputs will be cast from warnings * fix: remove redundant flex attention patch * chore: update olmo docs * feat: add validator patch for cross entropy	2025-12-25 17:56:20 +07:00
NanoCode012	97f1b1758d	Feat: add kimi linear support (#3257 ) * feat: add custom kimi linear patch [skip ci] * feat: add configuration file and fix import [skip ci] * fix: hijack tokenizer temporarily [skip ci] * chore: remove accidental commit * fix: attempt patch kimi remote * fix: kwargs passsed * fix: device for tensor * fix: aux loss calculation * feat: cleaned up patches order * fix: remove duplicate tokenizer patch * chore: add debug logs * chore: add debug logs * chore: debug * Revert "chore: add debug logs" This reverts commit `da372a5f67`. * Revert "chore: add debug logs" This reverts commit `97d1de1d7c`. * fix: KeyError: 'tokenization_kimi' * fix: support remote_model_id in cce patch * feat: add config preload patch * fix: use standard aux loss calc and updated modeling * fix: import * feat: add kimi-linear docs and example * chore: add note about moe kernels * feat: update cce to include kimi-linear * chore: lint * chore: update main readme * fix: patch mechanism to address comments * chore: lint * fix: tests * chore: cleanup comment	2025-12-25 17:53:52 +07:00
Abubakar Abid	f2155eaf79	feat: add trackio as experiment tracking integration (#3253 ) * feat: add trackio as experiment tracking integration - Add TrackioConfig to integrations schema with project_name, run_name, and space_id - Create trackio_.py module for environment setup - Add is_trackio_available() utility function - Integrate trackio with report_to in trainer builder - Add trackio callback for experiment tracking - Add trackio config keys to gpt-oss example YAMLs - Trackio runs locally by default, syncs to HF Space if space_id provided * changes * changes * changes * changes * changes * changes * changes * Update requirements.txt * don't allow pydantic 2.12 for now --------- Co-authored-by: Abubakar Abid <aaabid93@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-12-23 08:49:07 -05:00
kallewoof	92ee4256f7	feature: raise on long sequence drop (#3321 ) * feature: raise on long sequence drop It is sometimes not desired that sequences are silently dropped from the dataset, especially when the dataset has been carefully crafted and pre-fitted for the training context. This would then suggest that an error occurred somewhere in the process. This feature adds a third value for excess_length_strategy called 'raise', which will raise a ValueError if a sequence is encountered that is too long and would have normally been dropped/truncated. * tests: add excess_length_strategy tests * doc: updated return value description for drop_long_seq_in_dataset * add @enable_hf_offline * fixed cfg modified after validate_config called * hf offline fix * fix tqdm desc when raise is used * test: added test for non-batched case * accidental code change revert * test: use pytest.raises * test: simplified drop_seq_len tests * test: moved excess_length_strat test to test_data.py --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-12-22 13:59:49 -05:00
Wing Lian	efeb5a4e41	fix check for fp8 capability (#3324 ) * fix check for fp8 capability * handle non-cuda compute * reduce concurrency of tests	2025-12-22 13:58:25 -05:00
VED	faaff6c792	allow users to set ndigits for rounding of metrics when logging (#3325 ) * METRIC_PRECISION-> 8 * use ndigits and move env getter to top of log function --------- Co-authored-by: Ved <ved.work2024@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-12-22 08:54:43 -05:00
Alexander Kozhevnikov	43cef27458	Fix typo in densemixer RuntimeError (#3327 ) [skip ci] It offers installing densemizer while it should be densemixer	2025-12-22 08:53:58 -05:00
Wing Lian	07c41a6c2a	fix preview docs failing due to running out of disk (#3326 ) [skip ci] * fix preview docs failing due to running out of disk * fix docs publish too	2025-12-19 11:34:55 -05:00
salman	bbd3486f57	Distributed Muon Optimizer (#3264 ) * init * working * updating configs * removing unneeded files * lint * comments * lint * fix regex match * bump contribs version * comments * fixing tests and imports * muon imports in test v2 * test cleanup * bump contribs version --------- Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com”>	2025-12-19 10:43:47 -05:00
VED	3750d7dd64	add liger support kernal for dpo (#3302 ) * add liger kernal 4 dpo * revert grpo changes,add support in dpo * revert grpo changes,add support in dpo * dpo_use_liger_kernal * fix liger_dpo --------- Co-authored-by: Ved <ved.work2024@gmail.com>	2025-12-18 11:11:06 -05:00
xzuyn	2197b0bf89	feat: cheap ppl metric (#3317 ) * Import math and compute perplexity from loss values * lint * coderabbit changes * lint * fix: add rounding to ppl --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-12-18 09:02:41 -05:00
Seung Hyun Cho	3e51a680c2	fix: Fix evaluation loss in KD trainer (#3271 ) * fix: Fix evaluation loss in KD trainer * Fix v2 strategy super() call * fix: Add safety check for total_tokens in log method * fix: simplified num items and outputs return handling * fix: add missing model forward pass in compute_loss * refactor: Use Template Method pattern for chat template strategies * refactor: use pop(None) and remove v2 override * chore: lint --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-12-17 13:40:36 -05:00
xzuyn	2cf254b4af	Add `peft_autocast_adapter_dtype` config option (#3311 ) [skip ci] * Add `peft_autocast_adapter_dtype` field to schema * Add `autocast_adapter_dtype` to `model_kwargs` * chore: docs --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-12-17 10:09:39 -05:00
salman	83d4d97dcc	Add QAT NVFP4 configs for blogpost (#3280 ) [skip ci] * add configs for blogpost * fix configs * fixing baseline configs	2025-12-17 09:35:22 -05:00
NanoCode012	a1d07f42e4	Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate (#3313 ) * fix: leftover ministral docs changes * fix: pytorch_cuda_alloc_conf deprecation * fix: set old PYTORCH_CUDA_ALLOC_CONF env too * handle 2.9 separately --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-12-17 09:12:18 -05:00
Wing Lian	2a664dc8ad	support for xformers wheels for torch 2.9 (#3308 ) * support for xformers wheels for torch 2.9 * fix hf cache? * don't use hf cache from s3 * show disk free space in ci	2025-12-11 11:56:40 -05:00
NanoCode012	4ac78aa562	fix: update qwen3 jinja tokenization off a few tokens (#3295 ) * fix: update qwen3 jinja tokenization off a few tokens * fix: add note on tokenization issue * fix: pop last index for mistral tokenizer	2025-12-09 14:31:03 +07:00
VED	b3f4aa149f	fix bin size (#3307 ) * fix bin size * lint --------- Co-authored-by: Ved <ved.work2024@gmail.com>	2025-12-08 09:16:18 -05:00
salman	75b20fb66f	Save processor in quantizer CLI (#3290 )	2025-12-06 16:27:18 +00:00
NanoCode012	5992e607a2	fix: improve ministral3 docs to be clearer (#3300 ) * fix: improve ministral3 docs to be clearer * fix: title * chore: wording	2025-12-04 21:44:44 +07:00
NanoCode012	2b66ee189c	Feat: add ministral3 (#3297 ) * feat: add ministral and mistral3 * chore: lint * feat: update cce for ministral * fix: add vram usage * feat: update for release * fix: save_pretrained issue in v5 * fix: add instructions to use v5 branch * fix: add to multipack * fix: improve instructions * fix: add model to readme	2025-12-04 08:32:08 -05:00
NanoCode012	86d8cca149	Feat: add trinity by ArceeAI (#3292 )	2025-12-02 13:12:55 -05:00
NanoCode012	4a0f98e612	feat: upgrade liger to 0.6.4 (#3289 )	2025-12-02 09:16:23 -05:00
Yohan Na	c6ddcdd06a	feat: add exaone4 chat template and update enums (#3279 ) * feat: add exaone4 chat template and update enums * fix: handle first message as system or tools in exaone4 chat template * Update src/axolotl/utils/chat_templates/templates/exaone4.jinja Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix: lint --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-12-01 15:52:45 +07:00
github-actions[bot]	7fb6a947d9	chore: update pre-commit hooks (#3287 ) Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com>	2025-12-01 15:03:14 +07:00
NanoCode012	b234532d9f	Feat: add peft_ensure_weight_tying (#3278 ) * feat: upgrade peft to 0.18.0 * feat: add peft_ensure_weight_tying * fix: default * chore: adjust kwarg per feedback	2025-11-28 18:54:48 +07:00
VED	8990ca3205	fix: removed unused "scikit-learn==1.4.2" (#3277 ) Co-authored-by: Ved <ved.work2024@gmail.com>	2025-11-24 13:48:53 +07:00
NanoCode012	006f226270	Feat: add Olmo3 (BC with Olmo and Olmo2) (#3275 ) * feat: update cce to include olmo family * chore: update docs following feedback * feat: add olmo3 config * fix: clarify 3 methods * chore: add olmo to readme	2025-11-24 10:21:31 +07:00
Wing Lian	0b635e69c5	build docker images for 2.9.x (#3273 )	2025-11-20 09:26:24 -05:00
Wing Lian	0d27e14e45	Torch 2.9.1 base images (#3268 ) * update torch 2.9.1 base images * update base dockerfile image check	2025-11-20 09:04:37 -05:00
NanoCode012	f5f21fb216	chore: update readme with latest updates (#3267 ) Some checks failed ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (vllm, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, <nil>, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, <nil>, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (vllm, 126, 12.6.3, true, 3.11, 2.7.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.13.0	2025-11-18 14:45:21 +07:00
NanoCode012	4e55871112	feat: Add opt-out Telemetry (#3237 ) * initial telemetry manager impl * adding todo * updates * updates * progress on telemetry: config load, process, model load, train start / end, error tracking * update error file path sanitization function; adding more error tracking * updated sanitization logic, tests * adding runtime metrics (cpu + gpu memory, steps/s, etc.) * tests for runtime metrics telemetry and assoc. callback * small update / fix * simplifying path redaction * sleep on all ranks in distributed setting * adding back in base_model redaction w/ whitelist * fix * doc update * improved redaction, send system info during model config load telemetry, etc. * adding runtime metrics / system info additional accelerator support, etc. * adding runtime metrics / system info additional accelerator support, etc. * remove duplicate info * fixes * fix issue with tests in ci * distributed fix * opt-in version of telemetry * enable / disable logic update * docs fix * doc update * minor fixes * simplifying * slight changes * fix * lint * update posthog dep * coderabbit comments * fix: opt-in model * fix: increase time since last * fix: increase whitelist orgs * fix: posthog init and shutdown * fix: imports * fix: also check grad norm * fix: duplicate plugin_manager calls * fix: bad merge * chore: update docs * fix: cache process per comment * fix: error handling * fix: tests * Revert "fix: error handling" This reverts commit `22d1ea5755`. * fix: test telemetry error_handled bool * fix: revert test * chore: final doc fixes --------- Co-authored-by: Dan Saunders <danjsaund@gmail.com> Co-authored-by: Dan Saunders <dan@axolotl.ai>	2025-11-18 11:35:25 +07:00
Wing Lian	a6bafb55cb	upgrade datasets to 4.4.1 (#3266 ) * upgrade datasets * cleanup pip cache earlier * cleanup unused things from worker * also cleanup sdist	2025-11-14 09:52:14 -08:00
Wing Lian	0fbde69e9c	only push axolotl images, personal repo is deprecated (#3262 ) * only push axolotl images, personal repo is deprecated * cleanup	2025-11-14 07:50:03 -08:00
Wing Lian	301e22849f	upgrade to latest deepspeed and make sure latest tagged axolotl images are using torch 2.8.0 (#3261 )	2025-11-13 13:03:01 -05:00
VED	dcf24fd24e	feat: save checkpoint after training started (#3233 ) * add:config parameters for checkpoint * callback main * test file_type fix * lint * unit * simplify dict/obj handeling * Update src/axolotl/utils/schemas/dynamic_checkpoint.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Delete tests/e2e/integrations/__init__.py * remove hard code path in test * device check * lint * Update src/axolotl/utils/callbacks/dynamic_checkpoint.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update src/axolotl/utils/callbacks/dynamic_checkpoint.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update src/axolotl/utils/schemas/dynamic_checkpoint.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * lint-2 * remove: singal based checkpoints * lint * remove signal tests * add:is_main_process * lint * addis_d:istributed() for tests * remove nested is_main_process * Update src/axolotl/utils/schemas/dynamic_checkpoint.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * Update src/axolotl/utils/schemas/dynamic_checkpoint.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * add user_defined_filename --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2025-11-13 10:21:05 -05:00
NanoCode012	49b8107989	feat: add granite4 examples (#3256 ) [skip ci]	2025-11-13 10:19:16 -05:00
NanoCode012	9901ee5602	fix: voxtralprocessor broken (#3255 ) [skip ci] * fix: voxtralprocessor broken * chore: add todo * chore: wording	2025-11-13 10:18:42 -05:00

1 2 3 4 5 ...

2520 Commits