axolotl

Author	SHA1	Message	Date
Dan Saunders	95c259b3fb	depr warning	2025-09-26 10:26:21 -04:00
Dan Saunders	d1fd505813	update	2025-09-26 10:26:21 -04:00
Dan Saunders	1334281d50	docker fix	2025-09-26 10:26:21 -04:00
Dan Saunders	98f230d864	cleanup	2025-09-26 10:26:21 -04:00
Dan Saunders	02f308351c	fix	2025-09-26 10:25:58 -04:00
Dan Saunders	3b91e8174d	fix	2025-09-26 10:25:58 -04:00
Dan Saunders	40d906fb33	lint	2025-09-26 10:25:58 -04:00
Dan Saunders	89d5323c13	fix	2025-09-26 10:25:58 -04:00
Dan Saunders	df870f6a8f	fix	2025-09-26 10:24:59 -04:00
Dan Saunders	f500aaa490	fix	2025-09-26 10:24:59 -04:00
Dan Saunders	9ec33f52e3	wip	2025-09-26 10:24:59 -04:00
Dan Saunders	b453562c01	fixes	2025-09-26 10:24:59 -04:00
Dan Saunders	367f7eb3a6	fix	2025-09-26 10:24:59 -04:00
Dan Saunders	e888e38ce7	fix	2025-09-26 10:24:59 -04:00
Dan Saunders	400120af2d	wip	2025-09-26 10:24:59 -04:00
Dan Saunders	459e5f9b16	lint	2025-09-26 10:24:59 -04:00
Dan Saunders	43f6f84269	wip	2025-09-26 10:24:59 -04:00
Dan Saunders	36c4ab11f9	wip	2025-09-26 10:24:59 -04:00
Dan Saunders	2f4e4ef604	wip	2025-09-26 10:24:59 -04:00
Dan Saunders	aee03fc636	wip	2025-09-26 10:24:59 -04:00
Dan Saunders	255b818fbc	rebase	2025-09-26 10:24:59 -04:00
Dan Saunders	332ee74f32	rebase	2025-09-26 10:24:07 -04:00
Dan Saunders	3b0d2ac5c0	rebase	2025-09-26 10:21:49 -04:00
Dan Saunders	9462a1bf79	wip	2025-09-26 10:21:49 -04:00
Dan Saunders	8e9386c799	go uv first	2025-09-26 09:57:09 -04:00
Dan Saunders	740d5a1d31	doc fix (#3187 )	2025-09-26 09:55:15 -04:00
Grant Holmes (Ren)	850c1a5f8d	Add FSDP v2 swap memory support + QLoRA compatibility fixes (#3167 ) Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-09-26 10:23:59 +01:00
NanoCode012	7fa8ac40cd	Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches (#3178 ) * feat: upgrade cce with patches for transformers 4.56 * feat: add missing models to cce readme	2025-09-26 12:11:29 +07:00
Dan Saunders	f9748c4dc5	Cp fix (#3182 ) * patch transformers to allow CP + FA2 * nits * only patch in CP > 1 case	2025-09-25 12:03:50 -04:00
miketung	33975ce4bc	feat(qwen3-next): Adds targeting of shared expert and attention modules (#3183 ) * Adds targetting of shared expert and attention modules in each layer * Update VRAM usage --------- Co-authored-by: Mike Tung <mike@diffbot.com>	2025-09-25 17:06:16 +07:00
陈华杰	e8b962d47f	feat: support training with JSON string tool arguments (#3136 ) * feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error * feat: raise error for tool call arguments decode * Add test_chat_templates_tool_call_string_arguments.py Add test for string arguments * fix: change to correct qwen3 tokenizer * fix: update docs to clarify arguments json * chore: lint * fix: duplicate * chore: revert * feat: add error to faq * fix: remove duplicate fixture --------- Co-authored-by: caoqinping <caoqinping@lixiang.com> Co-authored-by: gamersover-blog <1611885128@qq.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-09-25 12:06:21 +07:00
NanoCode012	856ff12171	feat(doc): add optimizations table of content to our improvements (#3175 ) [skip ci] * chore: format * feat: add usage for alst * chore: wording * feat: add optimizations doc * Apply suggestion from @SalmanMohammadi Co-authored-by: salman <salman.mohammadi@outlook.com> * Update docs/dataset-formats/index.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> * feat: add alst, act offloading, nd parallelism, use relative links, and fix format * chore: comments --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-09-24 16:13:49 -04:00
Dan Saunders	6bc959342b	remove unused dep (#3180 )	2025-09-24 13:18:44 -04:00
NanoCode012	b3b92687c4	chore: rename gemma3 270m config (#3174 )	2025-09-24 13:48:38 +07:00
NanoCode012	55d1be2ae6	fix: unify default for conversations_field [skip-e2e] (#3070 ) * fix: unify default for conversations_field * fix: suggestion to remove defaults	2025-09-23 21:22:15 +07:00
NanoCode012	08d831c3d5	Feat: add qwen3-next (w packing+cce) (#3150 ) * feat: upgrade cce for qwen3-next * feat: add sample qwen3 config * feat: add packing patch for chunk_gated_delta_rule * feat: add qwen3 link * fix: tuple name * feat: add tested qwen3 config * fix: improve log * feat: add patch for fla without packing * fix: remove fla patch for standard mode * feat: enable packing * feat: add qwen3-next tests * chore: move tests	2025-09-23 11:31:15 +07:00
AlexHT Hung	7be8740c5c	fix(rl): pass max_prompt_len to training args as max_prompt_length (#3113 ) * pass max_prompt_len to training args as max_prompt_length * Update rl.py * refactor * format * fix: default for max_prompt_length * fix: defaults for trainer --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-09-19 17:34:28 +07:00
NanoCode012	c51d6b06c3	feat: add apertus model and cce (#3144 ) [skip ci] * feat: add apertus, glm4v, glm4v_moe cce * fix: arcee docs * feat: add apertus * feat: added vram usage * fix: add apertus note * feat: update doc on apertus xielu * fix: add monkeypatch for xielu activation issue * fix: simplify env * feat: pin commit * feat: add packing * chore: move patch calling * Update examples/apertus/README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * Update examples/apertus/README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * Update examples/apertus/README.md Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-09-19 17:34:04 +07:00
NanoCode012	09959fac70	Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 ) * feat: update mistral common * feat: add mistral3processor * fix: loading * fix: cast pixel_values to fp32 * fix: image tensor conversion * feat: add FA2 support for pixtral based models * fix: update mistral small 3.1 to use native tokenizer * fix: install tips * fix: improve info on sample dataset files * chore: move mistral configs into subfolders * fix: remove unneeded patch * fix: indent * feat: add integration tests * chore: move * feat: add magistral 2509 docs and example * fix: convert tensor to bool * feat: expand tests * chore: move tests	2025-09-18 15:42:20 +07:00
Dan Saunders	4065bc14c6	Debug log, logging improvements (#3159 ) * simplify logging * remove comment * progress on debug.log * add debug-level logger for file log * simplify * case insensitivity; 3rd party logging improvements * simplify * fix * tests * lint * nits * nit * Update tests/test_utils_tee.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * cleanup / comments * fix * oops --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-17 13:27:03 -04:00
salman	e5c427f6de	qat doc updates (#3162 ) [skip-ci]	2025-09-17 10:38:15 +01:00
Wing Lian	86d6ee7c05	upgrade trl and accelerate (#3161 ) * upgrade trl==0.23.0 * upgrade accelerate patch fix * add hints when using gradient_checkpointing with DPO * set gradient-checpointing properly	2025-09-16 14:53:01 -04:00
Wing Lian	d4cff1b7bb	improve setting of NCCL_P2P_DISABLE on runpod (#3132 ) [skip ci] * improve setting of NCCL_P2P_DISABLE on runpod * use recs from review	2025-09-16 14:52:45 -04:00
Wing Lian	1ef6c196f7	setup env vars for ray train for FSDP (#3130 ) [skip ci]	2025-09-16 14:52:29 -04:00
salman	58d67bf98d	Migrate QAT API; fix `axolotl quantize` for QAT-ed models; add NVFP4 (#3107 )	2025-09-12 10:55:50 +01:00
salman	0401a15888	SEO go brrr (#3153 ) [skip-ci]	2025-09-12 10:55:11 +01:00
NanoCode012	fcfc13d710	feat(doc): update thinking and chat_template notes (#3114 ) [skip ci] * feat: update thinking and chat_template notes * fix: grammar	2025-09-12 14:45:18 +07:00
salman	9406c0c488	log before eval step (#3148 ) [skip-ci]	2025-09-11 11:19:30 +01:00
Dan Saunders	1b53c49e1a	text diffusion training plugin (#3067 ) * diffusion training plugin * cleanup * nits * fixes + improvements * add back in reinit_weights (clobbered?); masking / pretrain fixes * nits * cleanup; tests draft * sample generation, tests fixes * fixes * nits * add inference support; add auto-mask token support * nits * nits * progress * simplify logging * lint * prefix args with diffusion_ * coderabbito * tests fix * nit * nits * cleanup + nits * nits * fix SFT sample gen * fixes * fix * comments * comments * lint * reward model lora fix * cleanup; fix pretraining_dataset case * gradio inference * update cfgs * update cfgs * train, generation parity, cleanup * fix * simplify * test * test fix	2025-09-10 20:27:00 -04:00
NanoCode012	b71482cec5	Feat: add hunyuan v1 (#3016 ) * feat: add hunyuan cce support * feat: update cce docs * feat: add multipack support for granite and hunyuan * feat: add hunyuan docs and example config * feat: update readme instructions to include CCE installation * fix: chat template log appearing despite tokenizer already having template * feat: add vram usage * fix: remove duplicate cce install * fix: use latest commit of PR in case rebased/pushed * Revert "fix: use latest commit of PR in case rebased/pushed" This reverts commit `8b60aa00de`. * feat: update doc as upstream merged	2025-09-10 09:03:30 +07:00

1 2 3 4 5 ...

2446 Commits