axolotl

Author	SHA1	Message	Date
Wing Lian	383f220cfd	build torch 2.9.0 base images (#3221 )	2025-10-20 08:53:49 -04:00
NanoCode012	8bb871b5cf	fix: deepspeed with context parallel (#3220 )	2025-10-20 14:06:58 +07:00
Leonard	87565ecc05	Add chat_template.argilla_chat support for DPO datasets (#3202 ) * Add chat_template.argilla_chat support for DPO datasets Creates a new chat_template.argilla_chat prompt strategy for handling DPO datasets where chosen/rejected fields contain full conversations (messages + final response), following the pattern of chatml.argilla_chat and llama3.argilla_chat. - Add argilla_chat() function to chat_template.py - Add chat_template.argilla_chat to RLHF documentation - Add test coverage for argilla_chat with multiple tokenizers Dataset format: { "chosen": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ], "rejected": [ {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."} ] } * Fix chat_template.argilla_chat return value contract and add docstring - Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn - Add remove_columns specification for field_chosen and field_rejected - Add comprehensive docstring with Args/Returns sections - Update tests to unpack tuple return value Addresses PR feedback to maintain consistency with chat_template.default() and properly specify columns to remove after dataset transformation. * Update tests/prompt_strategies/test_dpo_chat_templates.py Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2025-10-17 17:00:26 +07:00
NanoCode012	93ba57396f	fix: qwen3_vl attention config (#3216 )	2025-10-17 10:35:03 +07:00
NanoCode012	aa1240acd8	fix: transformers deprecate load_in_Xbit in model_kwargs (#3205 ) * fix: transformers deprecate load_in_Xbit in model_kwargs * fix: test to read from quantization_config kwarg * fix: test * fix: access * fix: test weirdly entering incorrect config	2025-10-16 16:07:27 +07:00
Wing Lian	4cdfdfebb5	upgrade transformers==4.57.1 and peft==0.23.1 (#3214 )	2025-10-14 15:54:05 -04:00
github-actions[bot]	6e2f5ccf9f	chore: update pre-commit hooks (#3211 ) [skip ci] Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>	2025-10-14 10:21:49 -04:00
NanoCode012	8c7f63cf97	fix: unpack cce imported incorrectly (#3212 ) [skip ci]	2025-10-13 17:19:15 +07:00
VED	cd856b45b1	feat:add support dataset_num_processes (#3129 ) [skip ci] * feat:add support dataset_num_processes * chore * required changes * requested chnages * required chnages * required changes * required changes * elif get_default_process_count() * add:del data * Update cicd/Dockerfile.jinja Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update cicd/single_gpu.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: salman <salman.mohammadi@outlook.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2025-10-13 17:18:12 +07:00
salman	143dea4753	`FSDPConfig` (#3170 )	2025-10-10 14:44:25 +01:00
Hitesh Sagtani	bc2ffb8204	fix: Enable KD plugin support for PEFT/LoRA adapters (#3207 ) - Fix _loss_function attribute not found on base model with PEFT - Fix mismatched attribute name (loss_function vs _loss_function) - Set _loss_function on unwrapped base model for PEFT - Enable previously skipped test_llama_lora_kd test - Add test config fixes for LoRA kernel compatibility Fixes https://github.com/axolotl-ai-cloud/axolotl/issues/3206	2025-10-10 08:57:00 -04:00
NanoCode012	153edcfe79	fix(doc): add act checkpointing migration to fsdp2 docs (#3193 ) [skip ci]	2025-10-10 10:57:50 +07:00
Wing Lian	08b8fa62cc	only calculate packed ds length once if using a large world size (#3210 )	2025-10-09 14:18:46 -04:00
Wing Lian	3a5c97e6e5	use can_device_access_peer for P2P checks (#3209 ) [skip ci] * use can_device_access_peer for P2P checks * also log warn when automatically setting NCCL_P2P_DISABLE=1	2025-10-09 14:17:31 -04:00
VED	37f78c8592	add chat_template_jinja to wandb (#3192 ) [skip ci] * add chat_template_jinja to wandb * temp_ct_file.flush() * Update src/axolotl/utils/callbacks/__init__.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * Update src/axolotl/utils/callbacks/__init__.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * Apply suggestion from @winglian --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2025-10-09 12:05:54 -04:00
NanoCode012	ab63b92c38	feat: add lfm2 family and latest moe model (#3208 ) * feat: add lfm2 family and latest moe model * fix: use ml-cross-entropy for lfm2 examples	2025-10-09 10:47:41 -04:00
Manh Nguyen	6f8ce024d1	Remove check_torch_compile_deepspeed (#3195 ) [skip ci] Signed-off-by: nguyen599 <pnvmanh2123@gmail.com>	2025-10-08 11:27:01 -04:00
Wing Lian	d0e9c3c1c5	When using Ray use prepare for dataloader fixes (#3198 ) * make sure to use ray prepare for dataloader fixes * ray tests use 2.7.0+ * don't call init_distributed w ray and deepspeed * handle dict deepspeed config * better handling of dict deepspeed config * use json.dumps * guard to_dict * wrap import for optional ray	2025-10-08 10:43:41 -04:00
github-actions[bot]	4c3488cc9f	chore: update pre-commit hooks (#3160 ) [skip ci] Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>	2025-10-08 08:58:02 -04:00
Wing Lian	130637a3fa	upgrade transformers to 4.57.0 (#3201 ) * upgrade transformers to 4.57.0 * remove deprecated autoawq and use latest peft * remove autoawq from setuptools script * fix imports * make sure torchvision is installed * remove support for BetterTransformer * skip fsdp_qlora_prequant test * more robust error reporting	2025-10-08 08:43:46 -04:00
VED	377c510e95	sleep model support (#3135 ) Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-10-08 12:39:21 +01:00
Wing Lian	409cfb8a87	deprecate torch 2.6.0 support (#3197 ) [skip ci]	2025-10-07 11:23:41 -04:00
Wing Lian	ce74c20109	don't cache pip install (#3194 ) * don't cache pip install * no cache dir for disk space for sdist too	2025-10-01 11:11:39 -04:00
VED	a6bfbe3400	torch_dtype -> dtype (#3177 ) * torch_dtype -> dtype * torch_dtype -> dtype	2025-10-01 15:02:51 +07:00
Dan Saunders	f4376748f3	debug log: multiprocess race condition fix (#3188 )	2025-09-26 15:07:39 -04:00
Dan Saunders	740d5a1d31	doc fix (#3187 )	2025-09-26 09:55:15 -04:00
Grant Holmes (Ren)	850c1a5f8d	Add FSDP v2 swap memory support + QLoRA compatibility fixes (#3167 ) Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-09-26 10:23:59 +01:00
NanoCode012	7fa8ac40cd	Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches (#3178 ) * feat: upgrade cce with patches for transformers 4.56 * feat: add missing models to cce readme	2025-09-26 12:11:29 +07:00
Dan Saunders	f9748c4dc5	Cp fix (#3182 ) * patch transformers to allow CP + FA2 * nits * only patch in CP > 1 case	2025-09-25 12:03:50 -04:00
miketung	33975ce4bc	feat(qwen3-next): Adds targeting of shared expert and attention modules (#3183 ) * Adds targetting of shared expert and attention modules in each layer * Update VRAM usage --------- Co-authored-by: Mike Tung <mike@diffbot.com>	2025-09-25 17:06:16 +07:00
陈华杰	e8b962d47f	feat: support training with JSON string tool arguments (#3136 ) * feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error * feat: raise error for tool call arguments decode * Add test_chat_templates_tool_call_string_arguments.py Add test for string arguments * fix: change to correct qwen3 tokenizer * fix: update docs to clarify arguments json * chore: lint * fix: duplicate * chore: revert * feat: add error to faq * fix: remove duplicate fixture --------- Co-authored-by: caoqinping <caoqinping@lixiang.com> Co-authored-by: gamersover-blog <1611885128@qq.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-09-25 12:06:21 +07:00
NanoCode012	856ff12171	feat(doc): add optimizations table of content to our improvements (#3175 ) [skip ci] * chore: format * feat: add usage for alst * chore: wording * feat: add optimizations doc * Apply suggestion from @SalmanMohammadi Co-authored-by: salman <salman.mohammadi@outlook.com> * Update docs/dataset-formats/index.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> * feat: add alst, act offloading, nd parallelism, use relative links, and fix format * chore: comments --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-09-24 16:13:49 -04:00
Dan Saunders	6bc959342b	remove unused dep (#3180 )	2025-09-24 13:18:44 -04:00
NanoCode012	b3b92687c4	chore: rename gemma3 270m config (#3174 )	2025-09-24 13:48:38 +07:00
NanoCode012	55d1be2ae6	fix: unify default for conversations_field [skip-e2e] (#3070 ) * fix: unify default for conversations_field * fix: suggestion to remove defaults	2025-09-23 21:22:15 +07:00
NanoCode012	08d831c3d5	Feat: add qwen3-next (w packing+cce) (#3150 ) * feat: upgrade cce for qwen3-next * feat: add sample qwen3 config * feat: add packing patch for chunk_gated_delta_rule * feat: add qwen3 link * fix: tuple name * feat: add tested qwen3 config * fix: improve log * feat: add patch for fla without packing * fix: remove fla patch for standard mode * feat: enable packing * feat: add qwen3-next tests * chore: move tests	2025-09-23 11:31:15 +07:00
AlexHT Hung	7be8740c5c	fix(rl): pass max_prompt_len to training args as max_prompt_length (#3113 ) * pass max_prompt_len to training args as max_prompt_length * Update rl.py * refactor * format * fix: default for max_prompt_length * fix: defaults for trainer --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-09-19 17:34:28 +07:00
NanoCode012	c51d6b06c3	feat: add apertus model and cce (#3144 ) [skip ci] * feat: add apertus, glm4v, glm4v_moe cce * fix: arcee docs * feat: add apertus * feat: added vram usage * fix: add apertus note * feat: update doc on apertus xielu * fix: add monkeypatch for xielu activation issue * fix: simplify env * feat: pin commit * feat: add packing * chore: move patch calling * Update examples/apertus/README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * Update examples/apertus/README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * Update examples/apertus/README.md Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-09-19 17:34:04 +07:00
NanoCode012	09959fac70	Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 ) * feat: update mistral common * feat: add mistral3processor * fix: loading * fix: cast pixel_values to fp32 * fix: image tensor conversion * feat: add FA2 support for pixtral based models * fix: update mistral small 3.1 to use native tokenizer * fix: install tips * fix: improve info on sample dataset files * chore: move mistral configs into subfolders * fix: remove unneeded patch * fix: indent * feat: add integration tests * chore: move * feat: add magistral 2509 docs and example * fix: convert tensor to bool * feat: expand tests * chore: move tests	2025-09-18 15:42:20 +07:00
Dan Saunders	4065bc14c6	Debug log, logging improvements (#3159 ) * simplify logging * remove comment * progress on debug.log * add debug-level logger for file log * simplify * case insensitivity; 3rd party logging improvements * simplify * fix * tests * lint * nits * nit * Update tests/test_utils_tee.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * cleanup / comments * fix * oops --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-17 13:27:03 -04:00
salman	e5c427f6de	qat doc updates (#3162 ) [skip-ci]	2025-09-17 10:38:15 +01:00
Wing Lian	86d6ee7c05	upgrade trl and accelerate (#3161 ) * upgrade trl==0.23.0 * upgrade accelerate patch fix * add hints when using gradient_checkpointing with DPO * set gradient-checpointing properly	2025-09-16 14:53:01 -04:00
Wing Lian	d4cff1b7bb	improve setting of NCCL_P2P_DISABLE on runpod (#3132 ) [skip ci] * improve setting of NCCL_P2P_DISABLE on runpod * use recs from review	2025-09-16 14:52:45 -04:00
Wing Lian	1ef6c196f7	setup env vars for ray train for FSDP (#3130 ) [skip ci]	2025-09-16 14:52:29 -04:00
salman	58d67bf98d	Migrate QAT API; fix `axolotl quantize` for QAT-ed models; add NVFP4 (#3107 )	2025-09-12 10:55:50 +01:00
salman	0401a15888	SEO go brrr (#3153 ) [skip-ci]	2025-09-12 10:55:11 +01:00
NanoCode012	fcfc13d710	feat(doc): update thinking and chat_template notes (#3114 ) [skip ci] * feat: update thinking and chat_template notes * fix: grammar	2025-09-12 14:45:18 +07:00
salman	9406c0c488	log before eval step (#3148 ) [skip-ci]	2025-09-11 11:19:30 +01:00
Dan Saunders	1b53c49e1a	text diffusion training plugin (#3067 ) * diffusion training plugin * cleanup * nits * fixes + improvements * add back in reinit_weights (clobbered?); masking / pretrain fixes * nits * cleanup; tests draft * sample generation, tests fixes * fixes * nits * add inference support; add auto-mask token support * nits * nits * progress * simplify logging * lint * prefix args with diffusion_ * coderabbito * tests fix * nit * nits * cleanup + nits * nits * fix SFT sample gen * fixes * fix * comments * comments * lint * reward model lora fix * cleanup; fix pretraining_dataset case * gradio inference * update cfgs * update cfgs * train, generation parity, cleanup * fix * simplify * test * test fix	2025-09-10 20:27:00 -04:00
NanoCode012	b71482cec5	Feat: add hunyuan v1 (#3016 ) * feat: add hunyuan cce support * feat: update cce docs * feat: add multipack support for granite and hunyuan * feat: add hunyuan docs and example config * feat: update readme instructions to include CCE installation * fix: chat template log appearing despite tokenizer already having template * feat: add vram usage * fix: remove duplicate cce install * fix: use latest commit of PR in case rebased/pushed * Revert "fix: use latest commit of PR in case rebased/pushed" This reverts commit `8b60aa00de`. * feat: update doc as upstream merged	2025-09-10 09:03:30 +07:00

1 2 3 4 5 ...

2446 Commits