Wing Lian
383f220cfd
build torch 2.9.0 base images ( #3221 )
2025-10-20 08:53:49 -04:00
NanoCode012
8bb871b5cf
fix: deepspeed with context parallel ( #3220 )
2025-10-20 14:06:58 +07:00
Leonard
87565ecc05
Add chat_template.argilla_chat support for DPO datasets ( #3202 )
...
* Add chat_template.argilla_chat support for DPO datasets
Creates a new chat_template.argilla_chat prompt strategy for handling
DPO datasets where chosen/rejected fields contain full conversations
(messages + final response), following the pattern of chatml.argilla_chat
and llama3.argilla_chat.
- Add argilla_chat() function to chat_template.py
- Add chat_template.argilla_chat to RLHF documentation
- Add test coverage for argilla_chat with multiple tokenizers
Dataset format:
{
"chosen": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"rejected": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
]
}
* Fix chat_template.argilla_chat return value contract and add docstring
- Return (transform_fn, dataset_kwargs) tuple instead of bare transform_fn
- Add remove_columns specification for field_chosen and field_rejected
- Add comprehensive docstring with Args/Returns sections
- Update tests to unpack tuple return value
Addresses PR feedback to maintain consistency with chat_template.default()
and properly specify columns to remove after dataset transformation.
* Update tests/prompt_strategies/test_dpo_chat_templates.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-10-17 17:00:26 +07:00
NanoCode012
93ba57396f
fix: qwen3_vl attention config ( #3216 )
2025-10-17 10:35:03 +07:00
NanoCode012
aa1240acd8
fix: transformers deprecate load_in_Xbit in model_kwargs ( #3205 )
...
* fix: transformers deprecate load_in_Xbit in model_kwargs
* fix: test to read from quantization_config kwarg
* fix: test
* fix: access
* fix: test weirdly entering incorrect config
2025-10-16 16:07:27 +07:00
Wing Lian
4cdfdfebb5
upgrade transformers==4.57.1 and peft==0.23.1 ( #3214 )
2025-10-14 15:54:05 -04:00
github-actions[bot]
6e2f5ccf9f
chore: update pre-commit hooks ( #3211 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-10-14 10:21:49 -04:00
NanoCode012
8c7f63cf97
fix: unpack cce imported incorrectly ( #3212 ) [skip ci]
2025-10-13 17:19:15 +07:00
VED
cd856b45b1
feat:add support dataset_num_processes ( #3129 ) [skip ci]
...
* feat:add support dataset_num_processes
* chore
* required changes
* requested chnages
* required chnages
* required changes
* required changes
* elif get_default_process_count()
* add:del data
* Update cicd/Dockerfile.jinja
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update cicd/single_gpu.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2025-10-13 17:18:12 +07:00
salman
143dea4753
FSDPConfig (#3170 )
2025-10-10 14:44:25 +01:00
Hitesh Sagtani
bc2ffb8204
fix: Enable KD plugin support for PEFT/LoRA adapters ( #3207 )
...
- Fix _loss_function attribute not found on base model with PEFT
- Fix mismatched attribute name (loss_function vs _loss_function)
- Set _loss_function on unwrapped base model for PEFT
- Enable previously skipped test_llama_lora_kd test
- Add test config fixes for LoRA kernel compatibility
Fixes https://github.com/axolotl-ai-cloud/axolotl/issues/3206
2025-10-10 08:57:00 -04:00
NanoCode012
153edcfe79
fix(doc): add act checkpointing migration to fsdp2 docs ( #3193 ) [skip ci]
2025-10-10 10:57:50 +07:00
Wing Lian
08b8fa62cc
only calculate packed ds length once if using a large world size ( #3210 )
2025-10-09 14:18:46 -04:00
Wing Lian
3a5c97e6e5
use can_device_access_peer for P2P checks ( #3209 ) [skip ci]
...
* use can_device_access_peer for P2P checks
* also log warn when automatically setting NCCL_P2P_DISABLE=1
2025-10-09 14:17:31 -04:00
VED
37f78c8592
add chat_template_jinja to wandb ( #3192 ) [skip ci]
...
* add chat_template_jinja to wandb
* temp_ct_file.flush()
* Update src/axolotl/utils/callbacks/__init__.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update src/axolotl/utils/callbacks/__init__.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Apply suggestion from @winglian
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-10-09 12:05:54 -04:00
NanoCode012
ab63b92c38
feat: add lfm2 family and latest moe model ( #3208 )
...
* feat: add lfm2 family and latest moe model
* fix: use ml-cross-entropy for lfm2 examples
2025-10-09 10:47:41 -04:00
Manh Nguyen
6f8ce024d1
Remove check_torch_compile_deepspeed ( #3195 ) [skip ci]
...
Signed-off-by: nguyen599 <pnvmanh2123@gmail.com >
2025-10-08 11:27:01 -04:00
Wing Lian
d0e9c3c1c5
When using Ray use prepare for dataloader fixes ( #3198 )
...
* make sure to use ray prepare for dataloader fixes
* ray tests use 2.7.0+
* don't call init_distributed w ray and deepspeed
* handle dict deepspeed config
* better handling of dict deepspeed config
* use json.dumps
* guard to_dict
* wrap import for optional ray
2025-10-08 10:43:41 -04:00
github-actions[bot]
4c3488cc9f
chore: update pre-commit hooks ( #3160 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-10-08 08:58:02 -04:00
Wing Lian
130637a3fa
upgrade transformers to 4.57.0 ( #3201 )
...
* upgrade transformers to 4.57.0
* remove deprecated autoawq and use latest peft
* remove autoawq from setuptools script
* fix imports
* make sure torchvision is installed
* remove support for BetterTransformer
* skip fsdp_qlora_prequant test
* more robust error reporting
2025-10-08 08:43:46 -04:00
VED
377c510e95
sleep model support ( #3135 )
...
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-10-08 12:39:21 +01:00
Wing Lian
409cfb8a87
deprecate torch 2.6.0 support ( #3197 ) [skip ci]
2025-10-07 11:23:41 -04:00
Wing Lian
ce74c20109
don't cache pip install ( #3194 )
...
* don't cache pip install
* no cache dir for disk space for sdist too
2025-10-01 11:11:39 -04:00
VED
a6bfbe3400
torch_dtype -> dtype ( #3177 )
...
* torch_dtype -> dtype
* torch_dtype -> dtype
2025-10-01 15:02:51 +07:00
Dan Saunders
f4376748f3
debug log: multiprocess race condition fix ( #3188 )
2025-09-26 15:07:39 -04:00
Dan Saunders
740d5a1d31
doc fix ( #3187 )
2025-09-26 09:55:15 -04:00
Grant Holmes (Ren)
850c1a5f8d
Add FSDP v2 swap memory support + QLoRA compatibility fixes ( #3167 )
...
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-26 10:23:59 +01:00
NanoCode012
7fa8ac40cd
Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches ( #3178 )
...
* feat: upgrade cce with patches for transformers 4.56
* feat: add missing models to cce readme
2025-09-26 12:11:29 +07:00
Dan Saunders
f9748c4dc5
Cp fix ( #3182 )
...
* patch transformers to allow CP + FA2
* nits
* only patch in CP > 1 case
2025-09-25 12:03:50 -04:00
miketung
33975ce4bc
feat(qwen3-next): Adds targeting of shared expert and attention modules ( #3183 )
...
* Adds targetting of shared expert and attention modules in each layer
* Update VRAM usage
---------
Co-authored-by: Mike Tung <mike@diffbot.com >
2025-09-25 17:06:16 +07:00
陈华杰
e8b962d47f
feat: support training with JSON string tool arguments ( #3136 )
...
* feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error
* feat: raise error for tool call arguments decode
* Add test_chat_templates_tool_call_string_arguments.py
Add test for string arguments
* fix: change to correct qwen3 tokenizer
* fix: update docs to clarify arguments json
* chore: lint
* fix: duplicate
* chore: revert
* feat: add error to faq
* fix: remove duplicate fixture
---------
Co-authored-by: caoqinping <caoqinping@lixiang.com >
Co-authored-by: gamersover-blog <1611885128@qq.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-25 12:06:21 +07:00
NanoCode012
856ff12171
feat(doc): add optimizations table of content to our improvements ( #3175 ) [skip ci]
...
* chore: format
* feat: add usage for alst
* chore: wording
* feat: add optimizations doc
* Apply suggestion from @SalmanMohammadi
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update docs/dataset-formats/index.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com >
* feat: add alst, act offloading, nd parallelism, use relative links, and fix format
* chore: comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-24 16:13:49 -04:00
Dan Saunders
6bc959342b
remove unused dep ( #3180 )
2025-09-24 13:18:44 -04:00
NanoCode012
b3b92687c4
chore: rename gemma3 270m config ( #3174 )
2025-09-24 13:48:38 +07:00
NanoCode012
55d1be2ae6
fix: unify default for conversations_field [skip-e2e] ( #3070 )
...
* fix: unify default for conversations_field
* fix: suggestion to remove defaults
2025-09-23 21:22:15 +07:00
NanoCode012
08d831c3d5
Feat: add qwen3-next (w packing+cce) ( #3150 )
...
* feat: upgrade cce for qwen3-next
* feat: add sample qwen3 config
* feat: add packing patch for chunk_gated_delta_rule
* feat: add qwen3 link
* fix: tuple name
* feat: add tested qwen3 config
* fix: improve log
* feat: add patch for fla without packing
* fix: remove fla patch for standard mode
* feat: enable packing
* feat: add qwen3-next tests
* chore: move tests
2025-09-23 11:31:15 +07:00
AlexHT Hung
7be8740c5c
fix(rl): pass max_prompt_len to training args as max_prompt_length ( #3113 )
...
* pass max_prompt_len to training args as max_prompt_length
* Update rl.py
* refactor
* format
* fix: default for max_prompt_length
* fix: defaults for trainer
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-19 17:34:28 +07:00
NanoCode012
c51d6b06c3
feat: add apertus model and cce ( #3144 ) [skip ci]
...
* feat: add apertus, glm4v, glm4v_moe cce
* fix: arcee docs
* feat: add apertus
* feat: added vram usage
* fix: add apertus note
* feat: update doc on apertus xielu
* fix: add monkeypatch for xielu activation issue
* fix: simplify env
* feat: pin commit
* feat: add packing
* chore: move patch calling
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-19 17:34:04 +07:00
NanoCode012
09959fac70
Feat: add Magistral Small 2509 and native mistral3 tokenizer support ( #3165 )
...
* feat: update mistral common
* feat: add mistral3processor
* fix: loading
* fix: cast pixel_values to fp32
* fix: image tensor conversion
* feat: add FA2 support for pixtral based models
* fix: update mistral small 3.1 to use native tokenizer
* fix: install tips
* fix: improve info on sample dataset files
* chore: move mistral configs into subfolders
* fix: remove unneeded patch
* fix: indent
* feat: add integration tests
* chore: move
* feat: add magistral 2509 docs and example
* fix: convert tensor to bool
* feat: expand tests
* chore: move tests
2025-09-18 15:42:20 +07:00
Dan Saunders
4065bc14c6
Debug log, logging improvements ( #3159 )
...
* simplify logging
* remove comment
* progress on debug.log
* add debug-level logger for file log
* simplify
* case insensitivity; 3rd party logging improvements
* simplify
* fix
* tests
* lint
* nits
* nit
* Update tests/test_utils_tee.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* cleanup / comments
* fix
* oops
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-17 13:27:03 -04:00
salman
e5c427f6de
qat doc updates ( #3162 ) [skip-ci]
2025-09-17 10:38:15 +01:00
Wing Lian
86d6ee7c05
upgrade trl and accelerate ( #3161 )
...
* upgrade trl==0.23.0
* upgrade accelerate patch fix
* add hints when using gradient_checkpointing with DPO
* set gradient-checpointing properly
2025-09-16 14:53:01 -04:00
Wing Lian
d4cff1b7bb
improve setting of NCCL_P2P_DISABLE on runpod ( #3132 ) [skip ci]
...
* improve setting of NCCL_P2P_DISABLE on runpod
* use recs from review
2025-09-16 14:52:45 -04:00
Wing Lian
1ef6c196f7
setup env vars for ray train for FSDP ( #3130 ) [skip ci]
2025-09-16 14:52:29 -04:00
salman
58d67bf98d
Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 ( #3107 )
2025-09-12 10:55:50 +01:00
salman
0401a15888
SEO go brrr ( #3153 ) [skip-ci]
2025-09-12 10:55:11 +01:00
NanoCode012
fcfc13d710
feat(doc): update thinking and chat_template notes ( #3114 ) [skip ci]
...
* feat: update thinking and chat_template notes
* fix: grammar
2025-09-12 14:45:18 +07:00
salman
9406c0c488
log before eval step ( #3148 ) [skip-ci]
2025-09-11 11:19:30 +01:00
Dan Saunders
1b53c49e1a
text diffusion training plugin ( #3067 )
...
* diffusion training plugin
* cleanup
* nits
* fixes + improvements
* add back in reinit_weights (clobbered?); masking / pretrain fixes
* nits
* cleanup; tests draft
* sample generation, tests fixes
* fixes
* nits
* add inference support; add auto-mask token support
* nits
* nits
* progress
* simplify logging
* lint
* prefix args with diffusion_
* coderabbito
* tests fix
* nit
* nits
* cleanup + nits
* nits
* fix SFT sample gen
* fixes
* fix
* comments
* comments
* lint
* reward model lora fix
* cleanup; fix pretraining_dataset case
* gradio inference
* update cfgs
* update cfgs
* train, generation parity, cleanup
* fix
* simplify
* test
* test fix
2025-09-10 20:27:00 -04:00
NanoCode012
b71482cec5
Feat: add hunyuan v1 ( #3016 )
...
* feat: add hunyuan cce support
* feat: update cce docs
* feat: add multipack support for granite and hunyuan
* feat: add hunyuan docs and example config
* feat: update readme instructions to include CCE installation
* fix: chat template log appearing despite tokenizer already having template
* feat: add vram usage
* fix: remove duplicate cce install
* fix: use latest commit of PR in case rebased/pushed
* Revert "fix: use latest commit of PR in case rebased/pushed"
This reverts commit 8b60aa00de .
* feat: update doc as upstream merged
2025-09-10 09:03:30 +07:00