VED
37f78c8592
add chat_template_jinja to wandb ( #3192 ) [skip ci]
...
* add chat_template_jinja to wandb
* temp_ct_file.flush()
* Update src/axolotl/utils/callbacks/__init__.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update src/axolotl/utils/callbacks/__init__.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Apply suggestion from @winglian
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-10-09 12:05:54 -04:00
NanoCode012
ab63b92c38
feat: add lfm2 family and latest moe model ( #3208 )
...
* feat: add lfm2 family and latest moe model
* fix: use ml-cross-entropy for lfm2 examples
2025-10-09 10:47:41 -04:00
Manh Nguyen
6f8ce024d1
Remove check_torch_compile_deepspeed ( #3195 ) [skip ci]
...
Signed-off-by: nguyen599 <pnvmanh2123@gmail.com >
2025-10-08 11:27:01 -04:00
Wing Lian
d0e9c3c1c5
When using Ray use prepare for dataloader fixes ( #3198 )
...
* make sure to use ray prepare for dataloader fixes
* ray tests use 2.7.0+
* don't call init_distributed w ray and deepspeed
* handle dict deepspeed config
* better handling of dict deepspeed config
* use json.dumps
* guard to_dict
* wrap import for optional ray
2025-10-08 10:43:41 -04:00
github-actions[bot]
4c3488cc9f
chore: update pre-commit hooks ( #3160 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-10-08 08:58:02 -04:00
Wing Lian
130637a3fa
upgrade transformers to 4.57.0 ( #3201 )
...
* upgrade transformers to 4.57.0
* remove deprecated autoawq and use latest peft
* remove autoawq from setuptools script
* fix imports
* make sure torchvision is installed
* remove support for BetterTransformer
* skip fsdp_qlora_prequant test
* more robust error reporting
2025-10-08 08:43:46 -04:00
VED
377c510e95
sleep model support ( #3135 )
...
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-10-08 12:39:21 +01:00
Wing Lian
409cfb8a87
deprecate torch 2.6.0 support ( #3197 ) [skip ci]
2025-10-07 11:23:41 -04:00
Wing Lian
ce74c20109
don't cache pip install ( #3194 )
...
* don't cache pip install
* no cache dir for disk space for sdist too
2025-10-01 11:11:39 -04:00
VED
a6bfbe3400
torch_dtype -> dtype ( #3177 )
...
* torch_dtype -> dtype
* torch_dtype -> dtype
2025-10-01 15:02:51 +07:00
Dan Saunders
f4376748f3
debug log: multiprocess race condition fix ( #3188 )
2025-09-26 15:07:39 -04:00
Dan Saunders
740d5a1d31
doc fix ( #3187 )
2025-09-26 09:55:15 -04:00
Grant Holmes (Ren)
850c1a5f8d
Add FSDP v2 swap memory support + QLoRA compatibility fixes ( #3167 )
...
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-26 10:23:59 +01:00
NanoCode012
7fa8ac40cd
Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches ( #3178 )
...
* feat: upgrade cce with patches for transformers 4.56
* feat: add missing models to cce readme
2025-09-26 12:11:29 +07:00
Dan Saunders
f9748c4dc5
Cp fix ( #3182 )
...
* patch transformers to allow CP + FA2
* nits
* only patch in CP > 1 case
2025-09-25 12:03:50 -04:00
miketung
33975ce4bc
feat(qwen3-next): Adds targeting of shared expert and attention modules ( #3183 )
...
* Adds targetting of shared expert and attention modules in each layer
* Update VRAM usage
---------
Co-authored-by: Mike Tung <mike@diffbot.com >
2025-09-25 17:06:16 +07:00
陈华杰
e8b962d47f
feat: support training with JSON string tool arguments ( #3136 )
...
* feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error
* feat: raise error for tool call arguments decode
* Add test_chat_templates_tool_call_string_arguments.py
Add test for string arguments
* fix: change to correct qwen3 tokenizer
* fix: update docs to clarify arguments json
* chore: lint
* fix: duplicate
* chore: revert
* feat: add error to faq
* fix: remove duplicate fixture
---------
Co-authored-by: caoqinping <caoqinping@lixiang.com >
Co-authored-by: gamersover-blog <1611885128@qq.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-25 12:06:21 +07:00
NanoCode012
856ff12171
feat(doc): add optimizations table of content to our improvements ( #3175 ) [skip ci]
...
* chore: format
* feat: add usage for alst
* chore: wording
* feat: add optimizations doc
* Apply suggestion from @SalmanMohammadi
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update docs/dataset-formats/index.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com >
* feat: add alst, act offloading, nd parallelism, use relative links, and fix format
* chore: comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-24 16:13:49 -04:00
Dan Saunders
6bc959342b
remove unused dep ( #3180 )
2025-09-24 13:18:44 -04:00
NanoCode012
b3b92687c4
chore: rename gemma3 270m config ( #3174 )
2025-09-24 13:48:38 +07:00
NanoCode012
55d1be2ae6
fix: unify default for conversations_field [skip-e2e] ( #3070 )
...
* fix: unify default for conversations_field
* fix: suggestion to remove defaults
2025-09-23 21:22:15 +07:00
NanoCode012
08d831c3d5
Feat: add qwen3-next (w packing+cce) ( #3150 )
...
* feat: upgrade cce for qwen3-next
* feat: add sample qwen3 config
* feat: add packing patch for chunk_gated_delta_rule
* feat: add qwen3 link
* fix: tuple name
* feat: add tested qwen3 config
* fix: improve log
* feat: add patch for fla without packing
* fix: remove fla patch for standard mode
* feat: enable packing
* feat: add qwen3-next tests
* chore: move tests
2025-09-23 11:31:15 +07:00
AlexHT Hung
7be8740c5c
fix(rl): pass max_prompt_len to training args as max_prompt_length ( #3113 )
...
* pass max_prompt_len to training args as max_prompt_length
* Update rl.py
* refactor
* format
* fix: default for max_prompt_length
* fix: defaults for trainer
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-19 17:34:28 +07:00
NanoCode012
c51d6b06c3
feat: add apertus model and cce ( #3144 ) [skip ci]
...
* feat: add apertus, glm4v, glm4v_moe cce
* fix: arcee docs
* feat: add apertus
* feat: added vram usage
* fix: add apertus note
* feat: update doc on apertus xielu
* fix: add monkeypatch for xielu activation issue
* fix: simplify env
* feat: pin commit
* feat: add packing
* chore: move patch calling
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-19 17:34:04 +07:00
NanoCode012
09959fac70
Feat: add Magistral Small 2509 and native mistral3 tokenizer support ( #3165 )
...
* feat: update mistral common
* feat: add mistral3processor
* fix: loading
* fix: cast pixel_values to fp32
* fix: image tensor conversion
* feat: add FA2 support for pixtral based models
* fix: update mistral small 3.1 to use native tokenizer
* fix: install tips
* fix: improve info on sample dataset files
* chore: move mistral configs into subfolders
* fix: remove unneeded patch
* fix: indent
* feat: add integration tests
* chore: move
* feat: add magistral 2509 docs and example
* fix: convert tensor to bool
* feat: expand tests
* chore: move tests
2025-09-18 15:42:20 +07:00
Dan Saunders
4065bc14c6
Debug log, logging improvements ( #3159 )
...
* simplify logging
* remove comment
* progress on debug.log
* add debug-level logger for file log
* simplify
* case insensitivity; 3rd party logging improvements
* simplify
* fix
* tests
* lint
* nits
* nit
* Update tests/test_utils_tee.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* cleanup / comments
* fix
* oops
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-17 13:27:03 -04:00
salman
e5c427f6de
qat doc updates ( #3162 ) [skip-ci]
2025-09-17 10:38:15 +01:00
Wing Lian
86d6ee7c05
upgrade trl and accelerate ( #3161 )
...
* upgrade trl==0.23.0
* upgrade accelerate patch fix
* add hints when using gradient_checkpointing with DPO
* set gradient-checpointing properly
2025-09-16 14:53:01 -04:00
Wing Lian
d4cff1b7bb
improve setting of NCCL_P2P_DISABLE on runpod ( #3132 ) [skip ci]
...
* improve setting of NCCL_P2P_DISABLE on runpod
* use recs from review
2025-09-16 14:52:45 -04:00
Wing Lian
1ef6c196f7
setup env vars for ray train for FSDP ( #3130 ) [skip ci]
2025-09-16 14:52:29 -04:00
salman
58d67bf98d
Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 ( #3107 )
2025-09-12 10:55:50 +01:00
salman
0401a15888
SEO go brrr ( #3153 ) [skip-ci]
2025-09-12 10:55:11 +01:00
NanoCode012
fcfc13d710
feat(doc): update thinking and chat_template notes ( #3114 ) [skip ci]
...
* feat: update thinking and chat_template notes
* fix: grammar
2025-09-12 14:45:18 +07:00
salman
9406c0c488
log before eval step ( #3148 ) [skip-ci]
2025-09-11 11:19:30 +01:00
Dan Saunders
1b53c49e1a
text diffusion training plugin ( #3067 )
...
* diffusion training plugin
* cleanup
* nits
* fixes + improvements
* add back in reinit_weights (clobbered?); masking / pretrain fixes
* nits
* cleanup; tests draft
* sample generation, tests fixes
* fixes
* nits
* add inference support; add auto-mask token support
* nits
* nits
* progress
* simplify logging
* lint
* prefix args with diffusion_
* coderabbito
* tests fix
* nit
* nits
* cleanup + nits
* nits
* fix SFT sample gen
* fixes
* fix
* comments
* comments
* lint
* reward model lora fix
* cleanup; fix pretraining_dataset case
* gradio inference
* update cfgs
* update cfgs
* train, generation parity, cleanup
* fix
* simplify
* test
* test fix
2025-09-10 20:27:00 -04:00
NanoCode012
b71482cec5
Feat: add hunyuan v1 ( #3016 )
...
* feat: add hunyuan cce support
* feat: update cce docs
* feat: add multipack support for granite and hunyuan
* feat: add hunyuan docs and example config
* feat: update readme instructions to include CCE installation
* fix: chat template log appearing despite tokenizer already having template
* feat: add vram usage
* fix: remove duplicate cce install
* fix: use latest commit of PR in case rebased/pushed
* Revert "fix: use latest commit of PR in case rebased/pushed"
This reverts commit 8b60aa00de .
* feat: update doc as upstream merged
2025-09-10 09:03:30 +07:00
NanoCode012
79103b01ca
Feat: add seedoss ( #3104 ) [skip ci]
...
* feat: add seedoss cce
* feat: add seedoss config and docs
* fix: shouldn't have target modules with target linear
* feat: add vram numbers
* fix: hf link
* fix: name
* fix: support multipack seedoss
* fix: merge error
* feat: update seedoss instructions for transformers release
2025-09-10 09:01:02 +07:00
salman
9640338d37
Default include_tkps to true ( #3134 )
...
* default true
* force e2e
* causal trainer only
* fix eval loggin [skip-ci]
* revert setup.py
* force tests
* guarding
* guarding
* fix test case
* use evaluate [skip-e2e]
* use evaluate [skip-e2e]
* kick off ci
* fixing
* reverting
2025-09-09 10:50:21 -04:00
Wing Lian
b5d4c7ff54
allow 1% deviation for codecov ( #3138 ) [skip ci]
2025-09-07 11:01:03 -04:00
Seungduk Kim
8fd9221f13
Add ipo as an rl type that shares DPODataset config ( #3128 )
...
* Add `ipo` as an `rl` type that shares DPODataset config
* chore: lint
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-09-07 10:49:10 -04:00
github-actions[bot]
bf00f29f3a
chore: update pre-commit hooks ( #3137 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-09-07 10:33:20 -04:00
NanoCode012
1d32278755
feat: upgrade transformers to v4.56.1 ( #3127 )
...
* feat: upgrade transformers to v4.56
* fix handling of CP/SP now that position_ids are default even for unpacked sequences
* feat: monkeypatch list_repo_templates
* fix: apply patch for tests only
* see if updated main works at least
* fix: update to patch release and remove monkeypatch
* remove fsdp2 eval patch
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-09-05 11:00:54 -04:00
NanoCode012
c6ae5c43cb
fix: chat template jinja file not being loaded during inference ( #3112 )
...
* fix: chat template jinja file not being loaded during inference
* fix: bot comment
2025-09-03 16:25:09 -04:00
yardenhoch
efa1da52d5
Center rewards coefficient ( #3124 )
...
* feat: add center_rewards_coefficient for reward modeling
- Add center_rewards_coefficient parameter to Pydantic schema with paper reference
- Pass parameter through base builder and causal builder to training args
- Add documentation section with usage examples and theoretical background
- Enable parameter in reward modeling example configs with recommended value
- Enables reward centering for improved training stability in RLHF workflows
Implements auxiliary loss from Eisenstein et al. 2023 (https://huggingface.co/papers/2312.09244 )
to incentivize mean-zero reward outputs without post-training normalization.
* Update description
* test: add unit tests for center_rewards_coefficient integration
* Update src/axolotl/core/builders/base.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update docs/reward_modelling.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update docs/reward_modelling.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* reference to TRL documentation.
* add new reward model configuration for qwen3 with comprehensive parameters
* Verified center_rewards_coefficient is correctly passed through the trainer builder to training arguments.
* Refactor reward modeling documentation to consolidate information on center_rewards_coefficient
* Remove unit tests for center_rewards_coefficient integration as part of codebase cleanup.
* linting
* nit
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* lint
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com >
2025-09-03 16:22:37 -04:00
mhenrichsen
48db520d92
Create 270m-qlora.yml ( #3075 ) [skip ci]
...
Adds 270m gemma3 qlora
2025-09-03 16:20:32 -04:00
NanoCode012
53a0c1f39c
feat: add peft_trainable_token_indices ( #3062 )
...
* feat: add peft_trainable_token_indices
* feat: add warning compat with fix_untrained_tokens
2025-09-03 01:48:01 -04:00
github-actions[bot]
4cc6038d52
chore: update pre-commit hooks ( #3122 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-09-03 01:41:34 -04:00
NanoCode012
e48aa8a5b1
feat(doc): improve visibility for colab notebooks ( #3110 ) [skip ci]
...
* feat: improve visibility for colab notebooks
* fix: link to GH colab
* feat: change to badge and move higher
2025-09-03 01:40:53 -04:00
xuyifann
24aba5caca
Clamping the len of dataloader to minimum of 1 ( #3100 ) [skip ci]
...
* Clamping the len of dataloader to minimum of 1
* linter reformat
2025-09-03 01:40:27 -04:00
Wing Lian
06bebcb65f
run cu128-2.8.0 e2e tests on B200 ( #3126 )
...
* run cu128-2.8.0 e2e tests on B200
* not an int 🤦
* fix yaml
2025-09-02 13:13:23 -04:00