axolotl

Author	SHA1	Message	Date
Wing Lian	e4032fc90f	Refactor separate attention flags with attn_implementation and capability/concerns feature flags (#3602 ) * upgrade to torchao 0.17.0 * chore: lint * refactor attention handling * replace legacy attention boolean flags with capability properties Replace checks with capability-based properties derived from attn_implementation This separates three concerns that were conflated under flash_attention: 1. Backend selection -> attn_implementation enum 2. Packing capability -> attn_supports_packing property 3. Flash-attn library dependency -> attn_uses_flash_lib property * compute attn capability flags in normalizer instead of properties * make attn_implementation the single source of truth * move attention-dependent validators to mode=after * migrate remaining consumers to canonical attn_implementation * expand attention tests + rewrite docs * migrate example configs to canonical attn_implementation * update doc snippets + reject gemma4-hybrid with non-FA2 backend * remove dead gemma4 branch in _set_attention_config * fix duplicate attn_implementation in gpt-oss yamls and flaky caplog tests * drop "Phase 2" naming from attn-implementation tests * regroup attn_implementation tests by feature concern * clean up verbose comments and remove MD Signed-off-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Axolotl Swarm <no-reply@axolotl.ai> * fix(collator): pass return_dict=True at apply_chat_template top level for transformers 5.x In transformers 5.x, ProcessorMixin.apply_chat_template gained its own `return_dict` parameter (defaulting to False). When return_dict=False and tokenize=True the method returns out["input_ids"] directly — a 2-D tensor — rather than the full BatchFeature dict. The old code placed `return_dict=True` inside processor_kwargs. In transformers 5.x those kwargs are forwarded to the underlying processor call self(...) where _merge_kwargs silently ignores any key not present in MllamaProcessorKwargs (emitting a warning). The outer return_dict therefore stayed False, apply_chat_template returned the raw input_ids tensor, and the subsequent `batch["input_ids"]` attempted to index a 2-D tensor with the 9-character string "input_ids", producing: IndexError: too many indices for tensor of dimension 2 The fix is to pass return_dict=True as a top-level keyword argument to apply_chat_template (where it is actually consumed) and remove it from processor_kwargs (where it was silently dropped). No version guard is needed: transformers is pinned to ==5.5.4 in pyproject.toml. Adds a unit-level regression test (tests/test_mm_chat_collator.py) that mocks the processor to return a raw tensor when apply_chat_template is called without top-level return_dict=True, verifying the four invariants: process_rows returns a dict, input_ids is 2-D, labels is 2-D, and apply_chat_template receives return_dict=True as a top-level kwarg. Fixes: tests/e2e/test_llama_vision.py::TestLlamaVision::test_lora_llama_vision_multimodal_dataset Fixes: tests/e2e/test_llama_vision.py::TestLlamaVision::test_lora_llama_vision_text_only_dataset Signed-off-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Axolotl Swarm <no-reply@axolotl.ai> * fix(collator): process_rows returns dict (BatchFeature) shape Two related changes for the multimodal chat collator under transformers 5.x: 1. Wrap apply_chat_template result in dict(...) so process_rows returns a plain dict rather than a BatchFeature instance. BatchFeature is a Mapping but not a dict; downstream code that did batch["labels"] = self.processing_strategy.process_labels(batch["input_ids"]) would index on a tensor when the result wasn't dict-shaped, raising IndexError: too many indices for tensor of dimension 2 2. Soften the regression test's contract from `dict` to `Mapping` so it exercises the actual semantic guarantee (key/value access) rather than the implementation detail (dict vs BatchFeature). Test guards against the original transformers 5.x breakage where apply_chat_template's return_dict default went from True to False. Includes regression test under tests/test_mm_chat_collator.py. Bug surfaced via swarm dispatch task_01KQHPNAYD8XARSNSDJVW1GPF6 against attn-implementation-refactor; squash-merged from agent commits 4de886fd + dc9fcf4f. Signed-off-by: Wing Lian <wing@axolotl.ai> --------- Signed-off-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Axolotl Swarm <no-reply@axolotl.ai>	2026-05-05 10:15:18 -04:00
NanoCode012	9de5b76336	feat: move to uv first (#3545 ) * feat: move to uv first * fix: update doc to uv first * fix: merge dev/tests into uv pyproject * fix: update docker docs to match current config * fix: migrate examples to readme * fix: add llmcompressor to conflict * feat: rec uv sync with lockfile for dev/ci * fix: update docker docs to clarify how to use uv images * chore: docs * fix: use system python, no venv * fix: set backend cpu * fix: only set for installing pytorch step * fix: remove unsloth kernel and installs * fix: remove U in tests * fix: set backend in deps too * chore: test * chore: comments * fix: attempt to lock torch * fix: workaround torch cuda and not upgraded * fix: forgot to push * fix: missed source * fix: nightly upstream loralinear config * fix: nightly phi3 long rope not work * fix: forgot commit * fix: test phi3 template change * fix: no more requirements * fix: carry over changes from new requirements to pyproject * chore: remove lockfile per discussion * fix: set match-runtime * fix: remove unneeded hf hub buildtime * fix: duplicate cache delete on nightly * fix: torchvision being overridden * fix: migrate to uv images * fix: leftover from merge * fix: simplify base readme * fix: update assertion message to be clearer * chore: docs * fix: change fallback for cicd script * fix: match against main exactly * fix: peft 0.19.1 change * fix: e2e test * fix: ci * fix: e2e test	2026-04-21 10:16:03 -04:00
Wing Lian	29fa4dedbb	Gemma4 fixes and profiler (#3591 )	2026-04-10 16:46:17 -04:00
NanoCode012	7daf7d96f1	fix: regex for unfrozen language tower (#3586 ) [skip ci] * fix: regex for unfrozen language tower * fix: other leftover regex	2026-04-08 08:18:11 -07:00
VED	9e64c76326	qwen3.5 configs (#3554 ) [skip ci] * qwen3.5 configs * update shared experts readme	2026-04-01 09:19:31 -04:00
Wing Lian	e412370877	roundup_power2_divisions not needed with newer pytorch versions (#3540 ) * roundup_power2_divisions not needed with newer pytorch versions * remove typo * update qwen3.5 moe 35b-a3b yaml for 5090 * more bug fixes * fix tests to match updated trainer * don't use fa2 for hooks test * reset plugins on the instance * retry download * fix references to renamed axolotl_cfg property on trainer * Fix ref to trainer cfg	2026-03-24 15:40:05 -04:00
Owen Arliawan	c57acef2c7	Qwen3.5-MoE example config with lora_target_modules regex (#3515 ) [skip ci] * lora target modules with regex * updates * fsdp for non moe * update wording * chore: cleanup and lint * chore: cleanup docs from merge --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-20 16:52:46 +07:00
VED	113d275bd9	qwen docs + new config (#3499 ) [skip ci] * qwen docs + new config * docss lint * simplify comments * read me * lint comments * Update docs/multimodal.qmd * Update docs/multimodal.qmd * Update examples/qwen3.5/9b-fft-vision.yaml * chore: fix link and incorrect points --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-03-20 16:13:34 +07:00
VED	c119382337	add: qwen 3.5 (#3442 ) * add: qwen 3.5 * test for qwen , patch * lint * qwen3 fix on main * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * moe config * config moe * configs and chore * Update examples/qwen3.5/122b-a10b-moe-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/qwen3.5/35b-a3b-moe-qlora.yaml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * chore for qwen + vlm patch * chore lint * qwen lint * 3_5_moe * Update examples/qwen3.5/README.md --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2026-03-06 09:31:00 -05:00

9 Commits