axolotl

Author	SHA1	Message	Date
NanoCode012	ed7105dba7	fix: GRPO config not accept max_prompt_length (#3390 ) [skip ci]	2026-02-10 17:52:09 +07:00
NanoCode012	b6d3653f74	feat: add step3p5 for cce (#3384 ) [skip ci] * feat: add step3p5 for cce * chore: reorder model	2026-02-10 17:51:43 +07:00
NanoCode012	fcc4cfdb63	feat: add sageattention (#2823 ) [skip ci] * feat: add sageattention * feat: call path on pre model load * fix: patch to use register to correct var * fix: add strict check import at start * chore: fix comments * chore: refactor * feat: add capability check * fix: missed underscore * fix: let sageattention use FA backend in transformers * feat: update sage attention for attention mask and position ids * feat: allow sample packing but add warning without packing * fix: loss hitting 0 with packing and attention mask note * feat: downcast embeds if sage attention too * feat: add config validation * feat: add attention docs * chore: docs	2026-02-10 17:49:21 +07:00
VED	97a4f28511	fix: saving state dict and eval for Context Parallel (#3382 ) [skip ci] * clone state_dict if none * patch calculating eval loss for cp	2026-02-10 17:47:26 +07:00
VED	86a5803212	train_per_sec_per_gpu metric (#3364 ) [skip ci] * fix token count * guard for none n zero	2026-02-10 17:44:55 +07:00
tgoab	530a0c0bf0	Changes from dataset_processes to dataset_num_proc (#3352 ) [skip ci] * changes from dataset_processes to dataset_num_proc * deprecation message improved --------- Co-authored-by: Juliana Nieto Cárdenas <jnietoca@purdue.edu>	2026-02-10 17:44:17 +07:00
VED	0343a72cc9	add glm support + patch (#3329 ) [skip ci] * add glm support + patch * lint * lint * Update examples/glm4/glm-4-6v-flash-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/glm4/glm-4-6v-flash-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update src/axolotl/processing_strategies.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * patch removed * lint * lint2 * docs + rename * rmv moe * docs * removed processor * sdpa T_T" * ddp_find_unused_parameters: true * muti gpu yaml tested both * muti gpu yaml tested both * Update examples/glm46v/README.md Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/glm46v/README.md Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/glm46v/README.md Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * rmv text only section + v5 comments * rename --------- Co-authored-by: Ved <ved.work2024@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-02-10 17:43:53 +07:00
Wing Lian	236dad3bb7	set 0.15.0.dev0 version (#3380 )	2026-01-30 21:28:01 -05:00
Wing Lian	be00978bc2	tag for v0.14.0 release (#3379 ) Some checks failed ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 129, 12.9.1, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 129, 12.9.1, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.14.0	2026-01-30 14:10:27 -05:00
Wing Lian	3738978394	Add support for batched_mm, grouped_mm and scattermoe for MoE models (#3377 ) * kernels plugin for moe for v5 * add support for native batched_mm or grouped_mm	2026-01-29 14:25:47 -05:00
Wing Lian	6132a30cda	handle warnings from v5 upgrade (#3376 )	2026-01-28 06:45:01 -05:00
NanoCode012	3dd86d35b8	feat: add new cce support for glm series and exaone4 (#3373 ) [skip ci]	2026-01-28 06:44:44 -05:00
salman	dd9ebaeba1	EAFT (#3366 ) [skip ci] * wip eaft * fix eaft loss fn * adding ref --------- Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com”>	2026-01-28 06:44:15 -05:00
Wing Lian	fc4e37920b	transformers v5 upgrade (#3272 ) * Prepare for transformers v5 upgrade * fix hf cli * update for hf hub changes * fix tokenizer apply_chat_template args * remap include_tokens_per_second * fix tps * handle migration for warmup * use latest hf hub * Fix scan -> ls * fix import * fix for renaming of mistral common tokenizer -> backend * update for fixed tokenziation for llama * Skip phi35 tests for now * remove mistral patch fixed upstream in huggingface/transformers#41439 * use namespacing for patch * don't rely on sdist for e2e tests for now * run modal ci without waiting too * Fix dep for ci * fix imports * Fix fp8 check * fsdp2 fixes * fix version handling * update fsdp version tests for new v5 behavior * Fail multigpu tests after 3 failures * skip known v5 broken tests for now and cleanup * bump deps * unmark skipped test * re-enable test_fsdp_qlora_prequant_packed test * increase multigpu ci timeout * skip broken gemma3 test * reduce timout back to original 120min now that the hanging test is skipped * fix for un-necessary collator for pretraining with bsz=1 * fix: safe_serialization deprecated in transformers v5 rc01 (#3318) * torch_dtype deprecated * load model in float32 for consistency with tests * revert some test fixtures back * use hf cache ls instead of scan * don't strip fsdp_version more fdsp_Version fixes for v5 fix version in fsdp_config fix aliasing fix fsdp_version check check fsdp_version is 2 in both places * Transformers v5 rc2 (#3347) * bump dep * use latest fbgemm, grab model config as part of fixture, un-skip test * import AutoConfig * don't need more problematic autoconfig when specifying config.json manually * add fixtures for argilla ultrafeedback datasets * download phi4-reasoning * fix arg * update tests for phi fast tokenizer changes * use explicit model types for gemma3 --------- Co-authored-by: Wing Lian <wing@axolotl.ai> * fix: AutoModelForVision2Seq -> AutoModelForImageTextToText * chore: remove duplicate * fix: attempt fix gemma3 text mode * chore: lint * ga release of v5 * need property setter for name_or_path for mistral tokenizer * vllm not compatible with transformers v5 * setter for chat_template w mistral too --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: salman <salman.mohammadi@outlook.com>	2026-01-27 17:08:24 -05:00
Wing Lian	a531e9d946	upgrade vllm to v0.14.0 (#3345 )	2026-01-21 20:00:18 -05:00
Wing Lian	04328aeb97	cu129 targets for ci builds (#3369 ) * cu129 targets for ci builds * remove copy-paste is_latest	2026-01-21 17:24:44 -05:00
VED	d0d26d5064	feat: Add GDPO Support (#3353 ) * gdpo support - test left * lint * fixxes for vllm serv * test advantages * docss * lint * lint = * gdpo simple + lint * lint nit * example * lint * trl 0.27.0 * blocklist * test assert rmv * add validation check for GDPO + sum_then_normalize --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-01-21 17:22:45 -05:00
Wing Lian	8623dd8a72	strip only starting 'v' char; e.g don't strip from '.dev' (#3368 ) [skip ci]	2026-01-21 14:19:03 -05:00
Wing Lian	8cd75cff9f	use cuda 12.9.1 and add python 3.12 to base images (#3367 )	2026-01-21 13:34:14 -05:00
Wing Lian	8ab9d9ea88	Version dev (#3365 )	2026-01-20 22:58:29 -05:00
Wing Lian	6e42def14b	set version to v0.13.1 (#3363 ) Some checks failed ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.13.1	2026-01-20 08:58:32 -05:00
Wing Lian	c413480b35	upgrade transformers to 4.57.6 and peft to 0.17.1 and datasets to 4.5.0 (#3361 )	2026-01-16 11:48:50 -05:00
Wing Lian	8f25124269	upgrade transformers to 4.57.5 (#3358 ) * upgrade transformers to 4.57.5 * explicitly set versions for fbgemm-gpu * handle index url for cuda version * explicitly set cu version for fbgemm deps, skip for 130 * cu suffix not needed on version if using whl subpath	2026-01-16 11:17:43 -05:00
Wing Lian	790df757cb	don't install xformers in for arm64 (#3359 ) * install xformers in the base docker image * install numba and numpy first * set CUDA_HOME for xformers install * Set cuda home env * don't install xformers by default on aarch64/arm64	2026-01-16 09:02:37 -05:00
Wing Lian	d282f32481	don't install deepspeed in arm64 images (#3357 )	2026-01-14 12:03:55 -05:00
Wing Lian	6331e4a130	fix amd64 and set 2.9.1 as latest cloud image (#3356 )	2026-01-14 11:56:36 -05:00
salman	1410e4474e	update PR template (#3349 ) [skip ci]	2026-01-14 09:39:21 -05:00
Wing Lian	dc77b5bf42	fix arm64 builds (#3355 ) * fix syntax for secrets in gha yaml * setup env for uv too * arm64 for base uv too * don't build causal-conv1d or mamba for arm64 and use arm64 wheels * fix dockerfile syntax * fix shell syntax	2026-01-14 09:38:48 -05:00
NanoCode012	359b7ad85e	fix: gemma3_text model loading vision config (#3354 ) * fix: gemma3-text mode loading vision config * fix: improve defaults to use lora kernels	2026-01-13 09:49:23 -05:00
VED	258ce8d4fa	feat : scaled softmax support (#3338 ) * scaled softmax * comment * lint * remove egear * validation for flash * lint * val imporve + neet * fix correct softmax scale val(learned) * learned scale val 4 ssm * lint * fix model_type rmv * sdpa_atten * test fix + lint * test fix * sdp_a val rmv * flex fix * main flash * lint * flex attn * lint comment * fix score_mod * Update src/axolotl/utils/schemas/validation.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: Ved <ved.work2024@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-01-13 14:33:11 +07:00
@TT	3e0bbd33ec	feat: add ARM64/AArch64 build support to Dockerfile-base (#3346 ) * Add support for capability to build arm64 image * Fixing wrong variable TARGETPLATFORM bug * Adding missing semicolons * skip docker hub login if PR (no push) or no credentials * Enabling arm64 builds for Dockerfile-base in Github actions * TARGETARCH automatically default to platform arch under build * Enabling arm64 builds for axolotl docker builds * Enabling arm64 builds for axolotl-cloud docker build Github actions --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-01-12 12:00:02 -05:00
salman	4ae6f766ad	bump bnb to v0.49.1 (#3351 )	2026-01-12 09:42:04 -05:00
VED	e7f0d4ba5b	Increased test coverage for lora/qlora (#3147 ) * config_val tests * remove config val(not needed) * config validation * parameter freeze validation * merge/unmerge tests * removal unwanted * rename * lint * updated lint * Update tests/utils/lora/test_config_validation_lora.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * pytest skip + mock fix * nitpicks * revert some nitpicks --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-01-06 11:44:48 -05:00
VED	7bf6f70e96	fix total/trainable tokens log (#3344 ) * fix total/trainable tokens log * fix total/trainable tokens log	2026-01-06 09:25:17 -05:00
PraMamba	8aab807e67	feat: Add SwanLab integration for experiment tracking (#3334 ) * feat(swanlab): add SwanLab integration for experiment tracking SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training. Features: - Hyperparameter logging - Training metrics tracking - RLHF completion logging - Performance profiling - Configuration validation and conflict detection Includes: - Plugin in src/axolotl/integrations/swanlab/ - Callback in src/axolotl/utils/callbacks/swanlab.py - Tests in tests/integrations/test_swanlab.py - Examples in examples/swanlab/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit - Change use_swanlab default to True (winglian) - Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major) - Add safe exception handling in config fallback (CodeRabbit) - Use context managers for file operations (CodeRabbit) - Replace LOG.error with LOG.exception for better debugging (CodeRabbit) - Sort __all__ alphabetically (CodeRabbit) - Add language specifiers to README code blocks (CodeRabbit) - Fix end-of-file newline in README (pre-commit) Resolves actionable comments and nitpicks from CodeRabbit review. Addresses reviewer feedback from @winglian. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * only run swanlab integration tests if package is available --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-01-06 09:19:18 -05:00
Wing Lian	ee59e4de97	add cu130 + torch 2.9.1 to test matrices (#3343 ) * add cu130 + torch 2.9.1 to test matrices * uv can't use pip3 directly	2026-01-05 15:24:29 -05:00
Wing Lian	4e61b8aa23	use updated version of prebuilt wheels for flash attention for cu130 (#3342 ) * use updated version of prebuilt wheels for flash attention for cu130 * use elif * fix the uv base installs of FA also * make wget less verbose	2026-01-05 13:48:12 -05:00
Wing Lian	b26ba3a5cb	don't build images w cuda 130 since we don't have flash attention wheels (#3341 )	2026-01-03 18:08:28 -05:00
Wing Lian	afe18ace35	deprecate torch 2.7.1 (#3339 )	2026-01-01 06:52:45 -05:00
github-actions[bot]	2b199f9915	chore: update pre-commit hooks (#3340 ) [skip ci] Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com>	2026-01-01 06:52:28 -05:00
Wing Lian	e73dab6df9	support pydantic 2.12 (#3328 ) * upgrade pydantic to 2.12 * use latest modal version * upgrade modal * update modal in requirements and loosen pydantic * upgrade modal too	2025-12-30 12:41:07 -05:00
VED	f45a97a9ff	docs for checkpiont saving (#3335 ) [skip ci] Co-authored-by: Ved <ved.work2024@gmail.com>	2025-12-30 12:40:32 -05:00
Wing Lian	11c0b5b256	bartch upgrade dependencies (#3299 ) * upgrade dependencies * don't use reset sessions * downgrade transformers, upgrade other deps * upgrade bnb to 0.49.0 * restore s3 cache * explicit use local files w hub * decompress and strip top level dir * use 2 levels for strip components * try to preserve permissions for symlinks * use updated tar * fix #3293 for distributed * downgrade bnb * fast fail after 4 * fix total tokens device * patch accelerate CP/SP (#3309) --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-12-30 09:02:49 -05:00
Wing Lian	66a3de3629	build examples readmes with quarto (#3046 ) * build examples readmes with quarto * chore: formatting * feat: dynamic build docs * feat: add more model guides * chore: format * fix: collapse sidebar completely to have space for model guides * fix: security protection for generated qmd * fix: adjust collapse level, add new models, update links --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-12-25 19:17:25 +07:00
VED	a6080df73c	compute loss only if training and update token metric naming (#3293 ) [skip ci] * compute loss only if training * save total_tokens for checkpiont * check if string * refactor total_tokens/ num_tokens * refactor 2 * rplc trainable_step/trian_per_sec_per_gpu * lint + log trainable/tokens * consolidate it in the callback. * test for total_tokes aftr remuse * check if tokenstate exist after ckpt --------- Co-authored-by: Ved <ved.work2024@gmail.com>	2025-12-25 18:38:17 +07:00
NanoCode012	4f5e8a328a	Feat: add MiMo and Plano (#3332 ) [skip-ci] * feat: add xiaomi's mimo 7b * fix: pin revision * fix: update trinity docs and pin revision * fix: wrong config name * feat: add vram usage * feat: add plano * feat: update plano vram usage * chore: comments	2025-12-25 18:09:03 +07:00
NanoCode012	418933f0d1	feat: add internvl3_5 (#3141 ) [skip-ci] * feat: add internvl3_5 * fix: add timm instructions * chore: add kimi-linear to cce doc * feat: update internvl example * chore: pin revision * chore: remove from multipack * fix: add to multimodal array * fix: internvl use hf version * feat: update cce * chore: lint * fix: list for image_size * chore: add docs vram usage * feat: enable cce * fix: no need trust remote code * fix: inconsistent timm version	2025-12-25 18:07:59 +07:00
NanoCode012	372f664c63	feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc (#3330 ) [skip-ci] * feat: add pos id to flex attention for packing part 1 * feat: update to include sliding window mask patch * fix: suppress MatMul8bitLt: inputs will be cast from warnings * fix: remove redundant flex attention patch * chore: update olmo docs * feat: add validator patch for cross entropy	2025-12-25 17:56:20 +07:00
NanoCode012	97f1b1758d	Feat: add kimi linear support (#3257 ) * feat: add custom kimi linear patch [skip ci] * feat: add configuration file and fix import [skip ci] * fix: hijack tokenizer temporarily [skip ci] * chore: remove accidental commit * fix: attempt patch kimi remote * fix: kwargs passsed * fix: device for tensor * fix: aux loss calculation * feat: cleaned up patches order * fix: remove duplicate tokenizer patch * chore: add debug logs * chore: add debug logs * chore: debug * Revert "chore: add debug logs" This reverts commit `da372a5f67`. * Revert "chore: add debug logs" This reverts commit `97d1de1d7c`. * fix: KeyError: 'tokenization_kimi' * fix: support remote_model_id in cce patch * feat: add config preload patch * fix: use standard aux loss calc and updated modeling * fix: import * feat: add kimi-linear docs and example * chore: add note about moe kernels * feat: update cce to include kimi-linear * chore: lint * chore: update main readme * fix: patch mechanism to address comments * chore: lint * fix: tests * chore: cleanup comment	2025-12-25 17:53:52 +07:00
Abubakar Abid	f2155eaf79	feat: add trackio as experiment tracking integration (#3253 ) * feat: add trackio as experiment tracking integration - Add TrackioConfig to integrations schema with project_name, run_name, and space_id - Create trackio_.py module for environment setup - Add is_trackio_available() utility function - Integrate trackio with report_to in trainer builder - Add trackio callback for experiment tracking - Add trackio config keys to gpt-oss example YAMLs - Trackio runs locally by default, syncs to HF Space if space_id provided * changes * changes * changes * changes * changes * changes * changes * Update requirements.txt * don't allow pydantic 2.12 for now --------- Co-authored-by: Abubakar Abid <aaabid93@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-12-23 08:49:07 -05:00

1 2 3 4 5 ...

2555 Commits