axolotl

Author	SHA1	Message	Date
Wing Lian	d17ed89a3c	add missing file	2026-04-21 08:44:01 -04:00
Wing Lian	02e4f2350d	fixes for scattermoe from latest peft upgrade	2026-04-21 08:00:16 -04:00
Wing Lian	4195605ab2	fix test dims	2026-04-21 00:44:26 +00:00
Wing Lian	37acb28d02	fix einsum dims	2026-04-20 23:09:47 +00:00
Wing Lian	4a5281e61a	Fix shape	2026-04-19 01:53:05 +00:00
Wing Lian	a892d8cce1	chore: lint	2026-04-17 17:48:26 +00:00
Wing Lian	78de2919a6	tiled mlp fix for gemma4	2026-04-16 13:24:41 +00:00
Wing Lian	28283ff373	revert shared_kv_states workaround with transformers 5.5.4	2026-04-15 13:32:59 +00:00
Wing Lian	dc16859983	[gemma4] fix fused RMSNorm+RoPE on hybrid attention models - Kernel: fused_rms_norm_rope crashed when cos.shape[-1] < x.shape[-1]. Triton forward/backward take an n_rot runtime arg that restricts rotate_half to [0, n_rot) and treats trailing cols as RMSNorm-only pass-through (cos=1, sin=0 defaults). Wrapper also expands cos/sin that broadcast over batch. - Forward: _make_fused_forward used a stale shared_kv_states kwarg the current decoder layer no longer passes. Now mirrors stock attention, reading/writing past_key_values.shared_layers.	2026-04-15 13:27:31 +00:00
Wing Lian	d4e9cf2eec	lint	2026-04-15 13:27:30 +00:00
Wing Lian	53391a10d7	vllm-serve-lora add /v1/completions route + worker pipe lock The LoRA vllm-serve wrapper only exposed /v1/chat/completions, but retrace's SWE agent server uses the token-id-aware /v1/completions endpoint so it can feed raw prompt_token_ids + track per-token logprobs across multi-turn rollouts. Add the route, mirroring the shape of /v1/chat/completions but routing to the vLLM worker's generate() method so prompt_token_ids are passed through as-is. Also add a worker_pipe_lock around conn.send/conn.recv. The multiprocessing.Connection to the vLLM worker is a single shared full-duplex pipe; concurrent HTTP requests interleave pickle frames on the wire and corrupt the stream (observed as UnpicklingError: pickle data was truncated, surfacing as 500s). The agent server fires ~8 concurrent rollout requests at once, so this was a hard blocker for any multi-concurrent workload. Serialize access to the pipe per-request round-trip.	2026-04-15 13:27:30 +00:00
Wing Lian	7617b951a8	make _maybe_sync_vllm_weights actually fire in sync mode Two bugs in ``AsyncGRPOTrainer._maybe_sync_vllm_weights`` plus a companion bug in the sync-hook patch site that together neutralized LoRA weight sync entirely whenever ``async_prefetch=False`` was combined with NeMo Gym's data-producer path: 1. ``_maybe_sync_vllm_weights`` had ``if not async_prefetch: return`` at the top. The original design assumed sync mode would fall back to TRL's stock per-step ``sync_weights`` call inside ``_generate_single_turn`` — true for vanilla GRPO but FALSE in NeMo Gym multi-turn, where ``NemoGymDataProducer`` calls the agent server directly and ``_generate_single_turn`` is never invoked. Result: no sync ever happened in NeMo Gym sync mode. 2. ``step % vllm_sync_interval`` would TypeError on the first call if ``vllm_sync_interval`` was unset (the default for any config that doesn't explicitly set it). 3. The ``_generate_single_turn`` patch installed ``vllm_generation.sync_weights = lambda: None`` unconditionally for vllm_lora_sync runs. That's correct in async-prefetch mode (BG thread can't safely sync) but wrong in sync mode: TRL's per-step auto-sync inside ``_generate_single_turn`` was the fallback that the early return in (1) was assuming, and the no-op patch was killing it. Fix: - Drop the ``not async_prefetch`` early return; ``_maybe_sync_vllm_weights`` is now the canonical sync trigger and runs in both modes from ``_prepare_inputs_with_data_producer`` / ``_prepare_inputs_legacy_async``. - Default ``vllm_sync_interval`` to 1 when unset. - In the ``_generate_single_turn`` patch, route sync_weights to ``_sync_lora_adapter`` in sync mode (and keep the lambda no-op in async mode for the BG-thread safety reason).	2026-04-15 13:27:30 +00:00
Wing Lian	e993ed5208	retry head-server probe with longer timeout ``get_server_configs`` was hardcoded to a 5s timeout with no retry. That's empirically too tight to survive a kill-and-relaunch cycle: when the agent server is finishing in-flight rollouts from a prior run, it can take 10-30s to respond to /global_config_dict_yaml, and the trainer would crash at startup with a ReadTimeoutError. Bump the per-attempt timeout to 30s and retry up to 3 times with a 2s/4s backoff. The retry intentionally raises a RuntimeError after the third failure rather than returning empty config — silent failure here would let training proceed with no agent servers discovered, which is also a no-op trainer.	2026-04-15 13:27:30 +00:00
Wing Lian	69f165b39b	probe vLLM weight-sync routes and select transport per server The plugin used to unconditionally monkey-patch VLLMClient.init_communicator to a no-op AND silently no-op sync_weights when vllm_lora_sync was off. Combined, this turned the trainer into a functional no-op whenever (a) the user ran NeMo Gym + LoRA without remembering to set vllm_lora_sync=true or (b) the user ran NeMo Gym + full fine-tune (which had no working sync path under the old code). Replace both patches with: 1. A probe of the configured vLLM server's /openapi.json at pre_model_load. Three transports are recognized: - NCCL (/init_communicator/ + /update_named_param/) — TRL serve and axolotl vllm-serve both expose this - LoRA filesystem (/v1/load_lora_adapter or /set_lora_adapter/) - HTTP base64 full-weight (/http_update_weights/) — axolotl vllm-serve only 2. A pure-logic ``select_weight_sync_transport`` that picks the right one for (server caps × adapter type). 3. ``init_communicator`` is only patched out when the server has no NCCL routes; against TRL/axolotl serve modules it stays live so full-finetune NCCL sync works. 4. ``post_trainer_create`` uses the selection table to install LoRA filesystem sync OR leave the standard NCCL flow alone OR raise NotImplementedError (HTTP — pending) OR raise a precise diagnosis when no transport is viable. No more silent no-op trainers.	2026-04-15 13:27:30 +00:00
Wing Lian	80a97f192b	validate batch shape against num_generations at config time Surfaces a class of GRPO config errors at axolotl-train startup instead of letting them bubble out of GRPOTrainer.__init__ after the model loads. Three checks under RLValidationMixin.check_grpo_batch_size_divisibility: - effective generation_batch_size (or mbGA fallback) must be divisible by trl.num_generations, with a hint pointing at the smallest GA bump that fixes the violation - num_generations >= 2 (group-relative advantage needs variance; with num_gen=1 the policy never updates) - When world_size > 1, effective gbs >= num_generations world_size 11 unit tests cover the table: divisible/non-divisible, explicit and implicit gbs, multi-rank constraint, GRPO-disabled passthrough, and unset num_generations.	2026-04-15 13:27:30 +00:00
Wing Lian	323da791eb	bump transformers to 5.5.4 and trl to latest 1.1.0 (#3603 ) * bump transformers to 5.5.4 and trl to latest 1.1.0 * more upgrades * update peft too * adapt lora_merge to peft 0.19 layer config API PEFT 0.19 requires a LoraConfig object on Linear/ParamWrapper/Conv layer constructors and moved use_rslora, use_dora, fan_in_fan_out, lora_dropout, and lora_bias into that config. Build the config per branch in _build_peft_layer_and_get_delta so the merge utility works with the upgraded peft. * allow lora_dropout on mixed attention+MoE configs under peft 0.19 PEFT 0.19's convert_peft_config_for_transformers auto-remaps old MoE target_modules (w1/w2/w3 on Mixtral, etc.) into target_parameters for transformers v5's fused 3D expert Parameters. Those targets get wrapped with ParamWrapper, which rejects lora_dropout != 0 because the 3D einsum can't factor dropout out of lora_B(lora_A(dropout(x))). Monkeypatch ParamWrapper.__init__ to internally use a copy of the LoraConfig with lora_dropout=0, so its dropout slot becomes nn.Identity while the shared config still delivers real dropout to sibling Linear LoRA layers (attention q/k/v/o). A probe runs the same conversion on a deep copy to detect the situation and emit a warning before patching.	2026-04-15 09:27:03 -04:00
NanoCode012	6990478163	fix: rename model to adapter_model for fsdp sharded final model (#3585 ) * fix: rename model to adapter_model for fsdp sharded final model * fix: follow upstream transformer shard size * fix: handle multiple model files * fix redundant condition, tighten to safetensors, keep shard size small --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-12 20:51:30 -04:00
ゆり	63a58cfec1	feat: support excess_length_strategy for RL trainers (#3578 ) [skip ci] * feat: support excess_length_strategy for RL trainers Previously, RL data loading always dropped sequences exceeding sequence_len. This adds support for the existing `excess_length_strategy` config option (`drop`, `truncate`, `raise`) in RL training pipelines, matching the behavior already available for SFT. - `drop` (default): unchanged behavior, filters out long samples - `truncate`: tokenizes text components, truncates responses to fit within sequence_len while preserving the full prompt, then decodes back to text. Handles DPO/IPO/ORPO/SIMPO and KTO datasets. - `raise`: raises ValueError if any sample exceeds sequence_len Closes #3547 * improve RL truncation strategy robustness and performance --------- Co-authored-by: yurekami <yurekami@users.noreply.github.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-12 20:51:10 -04:00
madScientist10	3985ec2f67	feat: add FineGrainedFP8Config support for model quantization (#3587 ) [skip ci] Allow loading FP8-quantized models (e.g. Mistral-Small-4-119B) with FineGrainedFP8Config and optional dequantize kwarg for full fine-tuning. Made-with: Cursor	2026-04-12 20:50:37 -04:00
Joaquin Hui	a44edda6d7	Skip redundant evaluation when resuming from checkpoint (#3575 ) [skip ci] * Skip redundant evaluation when resuming from checkpoint * add condition check for adding callback --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-12 20:50:15 -04:00
Wing Lian	66c3e5a3fd	better handling of dora merge on Conv layers in Qwen 3.5 (#3599 ) * better handling of dora merge on Conv layers in Qwen 3.5 * address issues from code review * stricter efficient merges for dora since we now have meta model to reference	2026-04-12 10:57:45 -04:00
Wing Lian	b8358aa5ab	[gemma4] use mixed Flash Attention and SDPA and add fused RMSNorm+RoPE Triton kernels (#3598 )	2026-04-12 10:29:55 -04:00
Joaquin Hui	e079cf16a2	qwen3_5.jinja: handle list content on system messages (#3595 ) [skip ci] * qwen3_5.jinja: handle list content on system messages The system message branch used string concatenation on messages[0].content, which breaks when the first system message uses the OpenAI-style list-of-parts format that multimodal datasets require. User and assistant branches already handle both string and list content, but the system branch did not. Check whether content is a string and fall back to iterating over parts when it is a list, matching the pattern used for user messages. Fixes #3590 * Address pr for other content types --------- Co-authored-by: Joaquin Hui Gomez <joaquinhuigomez@users.noreply.github.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-12 00:58:58 -04:00
Wing Lian	e2f69828d2	[fix][fsdp2] clone sharded param so original full size shard can be gc'ed (#3597 ) [skip ci]	2026-04-11 20:22:35 -04:00
Wing Lian	122b50bad6	pre-cache the eot token ids rather than on each iteration (#3594 ) [skip ci]	2026-04-11 20:05:21 -04:00
Wing Lian	e77a185e86	upgrade transformers to use v5.5.3 (#3593 )	2026-04-10 17:08:14 -04:00
Wing Lian	29fa4dedbb	Gemma4 fixes and profiler (#3591 )	2026-04-10 16:46:17 -04:00
Wing Lian	315cdeede9	handle trainable/masked spans in content and reasoning content (#3592 )	2026-04-10 14:11:10 -04:00
NanoCode012	e7a6a5b529	fix: move warning after we've set any overrides (#3589 ) [skip ci]	2026-04-10 13:00:47 -04:00
NanoCode012	bfb4da1d25	fix: document jinja2 file path support (#3588 ) [skip ci]	2026-04-10 13:00:26 -04:00
floaty3	4dfa0a59b2	Add uninstall command to cut_cross_entropy import message (#3583 ) [skip ci]	2026-04-10 13:00:07 -04:00
Wing Lian	4ef608dda3	fix ddp/fsdp w gemma4 (#3584 ) * fix ddp/fsdp w gemma4 * address pr comments * activation offloading fix and update agent docs for gemma4	2026-04-09 20:02:36 -07:00
NanoCode012	7daf7d96f1	fix: regex for unfrozen language tower (#3586 ) [skip ci] * fix: regex for unfrozen language tower * fix: other leftover regex	2026-04-08 08:18:11 -07:00
Wing Lian	7c56809c7f	use vllm 0.19.0 for torch 2.10.0 (#3582 )	2026-04-07 08:09:49 -07:00
NanoCode012	149178ddb7	chore: cleanup post release v0.16 (#3577 ) * fix: remove unneeded debug log * fix: cleanup * feat: add dense gemma config and cleanup * feat: add cce support * update notes and set torch compile * fix patch for new number of return vals * fixes for gemma4 * fix packing bug * use updated cce for mm * fix: pass in kv cache func when avail for transformers 5.5 * feat: update examples with flex variant and readme * gemma4 lora attention kernels --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-06 10:10:52 -07:00
NanoCode012	dc638e723f	fix(config): add cce and liger to nemotron-h example (#3573 ) [skip ci]	2026-04-06 10:10:25 -07:00
Wing Lian	6f15da4cac	make it easier for agents to discover docs (#3579 ) [skip ci] * make it easier for agents to discover docs * fixup pr comments	2026-04-06 10:00:55 -07:00
Maxime	900eec7988	Fix DO_NOT_TRACK not being correctly handled (#3580 ) * Fix DO_NOT_TRACK not being correctly handled * add unit tests and lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-04 05:16:58 -04:00
Wing Lian	08fc7de87e	gemma4 support (#3574 ) Some checks failed ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details * gemma4 support * fixes * chore: lint v0.16.1	2026-04-02 17:46:46 -04:00
Wing Lian	573726c839	upgrade torchao to 0.17.0 (#3569 ) Some checks failed ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.12, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-uv (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.12, 2.10.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details * upgrade to torchao 0.17.0 * upgrade mistral-common too * chore: lint * patch fix for torchao low bit optimizers * fix up * propagate dtype * fix test for ao change * address PR comments v0.16.0	2026-04-02 10:18:00 -04:00
NanoCode012	842fa039dd	feat: add sonicmoe fused lora support (#3519 ) * feat: add sonicmoe fused lora support * fix: forgot to add file * feat: add test * feat: add lora support for other routes * fix: add int8 lora support * fix: add qwen35_moe interleave support * fix: qwen3_5_moe loss * chore: lint * address some pr comments * fix test imports * add support matrix for moe kernels [skip ci] --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-02 08:53:48 -04:00
NanoCode012	16e32232fb	feat(docs): comprehensive improvement (#3564 ) * docs: comprehensive documentation improvements for humans and agents New human docs: - grpo.qmd: GRPO deep dive (async, rewards, IS correction, scaling) - ebft.qmd: EBFT guide (structured/strided modes, feature extraction) - choosing_method.qmd: decision tree for SFT vs LoRA vs DPO vs GRPO - vllm_serving.qmd: vLLM setup for GRPO (server/colocate, LoRA sync) - training_stability.qmd: monitoring, NaN debugging, OOM, healthy metrics New agent docs: - AGENTS_SFT.md: agent reference for supervised fine-tuning - AGENTS_DPO.md: agent reference for preference learning (DPO/KTO/ORPO) Updated existing docs: - rlhf.qmd: cross-references to new GRPO/EBFT/choosing-method guides - getting-started.qmd: reorganized Next Steps with links to new guides - debugging.qmd: link to training stability guide - _quarto.yml: added new pages to sidebar navigation Removed: - bak.agents.md: stale backup that confused agents * docs: trim duplicated generic config from AGENTS_DPO.md Remove boilerplate training params (optimizer, gradient_checkpointing, flash_attention, etc.) from each method template. These are not preference-learning-specific and are already covered in AGENTS_SFT.md. Config templates now show only method-specific fields with a reference to AGENTS_SFT.md for the rest. * docs: deduplicate across new doc pages - grpo.qmd: collapse vLLM setup section to brief config + link to vllm_serving.qmd; collapse IS correction to essentials + link; replace full monitoring tables with summary + link to training_stability.qmd - vllm_serving.qmd: remove duplicated async/IS config reference tables (already in grpo.qmd config reference); replace full example config with link to grpo.qmd quick start - ebft.qmd: trim generic training params in quick start config * fix: train scripts * feat: split files into cleaner parts * fix: cleanup pretraining docs --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2026-04-02 08:01:26 -04:00
Andrew Wu	50e9573f24	Update lm-eval for transformers v5 support (#3571 ) [skip ci]	2026-04-01 23:25:18 -04:00
Edward Zion Saji	55a7950e3d	fix: DPO tool role KeyError (#3217 ), dataset hash output_dir (#3303 ), config validators (#3538 ) [skip ci] * fix: DPO tool role KeyError, dataset hash output_dir, config validators [skip-e2e] - Add 'tool' to default role_map_inv in dpo/chat_template.py default() and argilla_chat() so datasets with tool-call messages no longer raise KeyError: 'tool' (closes #3217) - Fix generate_dataset_hash_from_config to use canonical tokenizer config + overrides content instead of tokenizer.name_or_path when added_tokens_overrides is set, preventing cache busting when only output_dir changes (closes #3303) - Add three Pydantic config validators to AxolotlConfigWCapabilities: * save_strategy: 'best' requires metric_for_best_model * streaming=True is incompatible with val_set_size > 0 * lora_target_modules list entries must be valid Python regex patterns - Tests for all three changes * review: condense comment in shared.py, swap Mistral model for SmolLM2-135M in test_hash * chore: lint * move the validators out of the w/ capabilities schema --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-01 19:57:07 -04:00
VED	c92b71bd0c	MX QAT patch (#3553 ) * qat patch * tests fixes * fixup per PR code review * use state dict hooks to handle dequant for saving safetensors from transformers * use transformers torch ao quantizer hooks to save mx quantized model --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2026-04-01 18:21:02 -04:00
Wing Lian	6c92b5c31c	lazy load trainer classes to prevent unnecesary imports (#3568 ) * lazy load trainer classes to prevent unnecesary imports * make the lazy load a common util	2026-04-01 13:29:04 -04:00
Joaquin Hui	1b1fc917bc	Add precompute_ref_log_probs to config schema (#3555 ) [skip ci] * Add precompute_ref_log_probs to config schema * chore: add description for config * Add test for precompute_ref_log_probs and move to training args * useing precompute logprobs as the default slows down CI as it has to precompute --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-04-01 13:28:40 -04:00
Mario Župan	96ae8bdd1d	Add troubleshooting note for GLM4 GGUF MTP mismatch (#3559 ) [skip ci] * Add troubleshooting note for GLM4 GGUF MTP mismatch * Fix JSON syntax for num_nextn_predict_layers example * fix: concise --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-04-01 10:05:06 -04:00
github-actions[bot]	438ea7b045	chore: update pre-commit hooks (#3567 ) [skip ci] Co-authored-by: SalmanMohammadi <25081738+SalmanMohammadi@users.noreply.github.com>	2026-04-01 10:04:21 -04:00
kallewoof	f6c122b76d	allow bf16 flag but warn (#3563 ) [skip ci] * allow bf16 flag but warn Reason: when doing e.g. LoRA merges with CUDA_VISIBLE_DEVICES=, this will unnecessarily crash, even though the LoRA merge operation would have finished successfully. This seems to warrant changing it to a warning instead, as the code will most likely crash later if bf16 is unavailable and training begins anyway. * don't use deprecated LOG.warn * update tests to reflect validation change	2026-04-01 09:54:01 -04:00

1 2 3 4 5 ...

2726 Commits