Dan Saunders
f6ed8ddc01
fix
2025-09-17 13:44:26 -04:00
Dan Saunders
556d6448fe
fix
2025-09-17 13:44:26 -04:00
Dan Saunders
5c2229721d
diag
2025-09-17 13:44:26 -04:00
Dan Saunders
d7de6b0e96
grouped_mm
2025-09-17 13:44:26 -04:00
Dan Saunders
3c6648678f
numerics
2025-09-17 13:44:26 -04:00
Dan Saunders
5b19a1ea9c
improve
2025-09-17 13:44:26 -04:00
Dan Saunders
cfefad1eea
fix
2025-09-17 13:44:26 -04:00
Dan Saunders
125e7b5fe6
fast path
2025-09-17 13:44:26 -04:00
Dan Saunders
479b6144df
tflops
2025-09-17 13:44:26 -04:00
Dan Saunders
68da65cba2
update
2025-09-17 13:44:26 -04:00
Dan Saunders
0d689bb421
cache, example
2025-09-17 13:44:26 -04:00
Dan Saunders
43ada1278a
moe kernels init scaffold
2025-09-17 13:44:26 -04:00
Dan Saunders
4065bc14c6
Debug log, logging improvements ( #3159 )
...
* simplify logging
* remove comment
* progress on debug.log
* add debug-level logger for file log
* simplify
* case insensitivity; 3rd party logging improvements
* simplify
* fix
* tests
* lint
* nits
* nit
* Update tests/test_utils_tee.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* cleanup / comments
* fix
* oops
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-17 13:27:03 -04:00
salman
e5c427f6de
qat doc updates ( #3162 ) [skip-ci]
2025-09-17 10:38:15 +01:00
Wing Lian
86d6ee7c05
upgrade trl and accelerate ( #3161 )
...
* upgrade trl==0.23.0
* upgrade accelerate patch fix
* add hints when using gradient_checkpointing with DPO
* set gradient-checpointing properly
2025-09-16 14:53:01 -04:00
Wing Lian
d4cff1b7bb
improve setting of NCCL_P2P_DISABLE on runpod ( #3132 ) [skip ci]
...
* improve setting of NCCL_P2P_DISABLE on runpod
* use recs from review
2025-09-16 14:52:45 -04:00
Wing Lian
1ef6c196f7
setup env vars for ray train for FSDP ( #3130 ) [skip ci]
2025-09-16 14:52:29 -04:00
salman
58d67bf98d
Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 ( #3107 )
2025-09-12 10:55:50 +01:00
salman
0401a15888
SEO go brrr ( #3153 ) [skip-ci]
2025-09-12 10:55:11 +01:00
NanoCode012
fcfc13d710
feat(doc): update thinking and chat_template notes ( #3114 ) [skip ci]
...
* feat: update thinking and chat_template notes
* fix: grammar
2025-09-12 14:45:18 +07:00
salman
9406c0c488
log before eval step ( #3148 ) [skip-ci]
2025-09-11 11:19:30 +01:00
Dan Saunders
1b53c49e1a
text diffusion training plugin ( #3067 )
...
* diffusion training plugin
* cleanup
* nits
* fixes + improvements
* add back in reinit_weights (clobbered?); masking / pretrain fixes
* nits
* cleanup; tests draft
* sample generation, tests fixes
* fixes
* nits
* add inference support; add auto-mask token support
* nits
* nits
* progress
* simplify logging
* lint
* prefix args with diffusion_
* coderabbito
* tests fix
* nit
* nits
* cleanup + nits
* nits
* fix SFT sample gen
* fixes
* fix
* comments
* comments
* lint
* reward model lora fix
* cleanup; fix pretraining_dataset case
* gradio inference
* update cfgs
* update cfgs
* train, generation parity, cleanup
* fix
* simplify
* test
* test fix
2025-09-10 20:27:00 -04:00
NanoCode012
b71482cec5
Feat: add hunyuan v1 ( #3016 )
...
* feat: add hunyuan cce support
* feat: update cce docs
* feat: add multipack support for granite and hunyuan
* feat: add hunyuan docs and example config
* feat: update readme instructions to include CCE installation
* fix: chat template log appearing despite tokenizer already having template
* feat: add vram usage
* fix: remove duplicate cce install
* fix: use latest commit of PR in case rebased/pushed
* Revert "fix: use latest commit of PR in case rebased/pushed"
This reverts commit 8b60aa00de .
* feat: update doc as upstream merged
2025-09-10 09:03:30 +07:00
NanoCode012
79103b01ca
Feat: add seedoss ( #3104 ) [skip ci]
...
* feat: add seedoss cce
* feat: add seedoss config and docs
* fix: shouldn't have target modules with target linear
* feat: add vram numbers
* fix: hf link
* fix: name
* fix: support multipack seedoss
* fix: merge error
* feat: update seedoss instructions for transformers release
2025-09-10 09:01:02 +07:00
salman
9640338d37
Default include_tkps to true ( #3134 )
...
* default true
* force e2e
* causal trainer only
* fix eval loggin [skip-ci]
* revert setup.py
* force tests
* guarding
* guarding
* fix test case
* use evaluate [skip-e2e]
* use evaluate [skip-e2e]
* kick off ci
* fixing
* reverting
2025-09-09 10:50:21 -04:00
Wing Lian
b5d4c7ff54
allow 1% deviation for codecov ( #3138 ) [skip ci]
2025-09-07 11:01:03 -04:00
Seungduk Kim
8fd9221f13
Add ipo as an rl type that shares DPODataset config ( #3128 )
...
* Add `ipo` as an `rl` type that shares DPODataset config
* chore: lint
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-09-07 10:49:10 -04:00
github-actions[bot]
bf00f29f3a
chore: update pre-commit hooks ( #3137 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-09-07 10:33:20 -04:00
NanoCode012
1d32278755
feat: upgrade transformers to v4.56.1 ( #3127 )
...
* feat: upgrade transformers to v4.56
* fix handling of CP/SP now that position_ids are default even for unpacked sequences
* feat: monkeypatch list_repo_templates
* fix: apply patch for tests only
* see if updated main works at least
* fix: update to patch release and remove monkeypatch
* remove fsdp2 eval patch
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-09-05 11:00:54 -04:00
NanoCode012
c6ae5c43cb
fix: chat template jinja file not being loaded during inference ( #3112 )
...
* fix: chat template jinja file not being loaded during inference
* fix: bot comment
2025-09-03 16:25:09 -04:00
yardenhoch
efa1da52d5
Center rewards coefficient ( #3124 )
...
* feat: add center_rewards_coefficient for reward modeling
- Add center_rewards_coefficient parameter to Pydantic schema with paper reference
- Pass parameter through base builder and causal builder to training args
- Add documentation section with usage examples and theoretical background
- Enable parameter in reward modeling example configs with recommended value
- Enables reward centering for improved training stability in RLHF workflows
Implements auxiliary loss from Eisenstein et al. 2023 (https://huggingface.co/papers/2312.09244 )
to incentivize mean-zero reward outputs without post-training normalization.
* Update description
* test: add unit tests for center_rewards_coefficient integration
* Update src/axolotl/core/builders/base.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update docs/reward_modelling.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update docs/reward_modelling.qmd
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* reference to TRL documentation.
* add new reward model configuration for qwen3 with comprehensive parameters
* Verified center_rewards_coefficient is correctly passed through the trainer builder to training arguments.
* Refactor reward modeling documentation to consolidate information on center_rewards_coefficient
* Remove unit tests for center_rewards_coefficient integration as part of codebase cleanup.
* linting
* nit
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* lint
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com >
2025-09-03 16:22:37 -04:00
mhenrichsen
48db520d92
Create 270m-qlora.yml ( #3075 ) [skip ci]
...
Adds 270m gemma3 qlora
2025-09-03 16:20:32 -04:00
NanoCode012
53a0c1f39c
feat: add peft_trainable_token_indices ( #3062 )
...
* feat: add peft_trainable_token_indices
* feat: add warning compat with fix_untrained_tokens
2025-09-03 01:48:01 -04:00
github-actions[bot]
4cc6038d52
chore: update pre-commit hooks ( #3122 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-09-03 01:41:34 -04:00
NanoCode012
e48aa8a5b1
feat(doc): improve visibility for colab notebooks ( #3110 ) [skip ci]
...
* feat: improve visibility for colab notebooks
* fix: link to GH colab
* feat: change to badge and move higher
2025-09-03 01:40:53 -04:00
xuyifann
24aba5caca
Clamping the len of dataloader to minimum of 1 ( #3100 ) [skip ci]
...
* Clamping the len of dataloader to minimum of 1
* linter reformat
2025-09-03 01:40:27 -04:00
Wing Lian
06bebcb65f
run cu128-2.8.0 e2e tests on B200 ( #3126 )
...
* run cu128-2.8.0 e2e tests on B200
* not an int 🤦
* fix yaml
2025-09-02 13:13:23 -04:00
Dan Saunders
231a67e70b
Streaming SFT support ( #3101 )
...
* working
* fixes
* deprecate --iterable; cleanup
* pretrain_multipack_buffer_size -> streaming_multipack_buffer_size
* improvements
* tests
* remove unused
* docs, examples
* nit
* nit
* add val_set_size validation
* val
* nit
* min
* coderabbito
* cleanup
* nit
* add depr warning, cleanup
* nit
* fix test, fix quarto
* fix
* review comments
* review comments
* fix
2025-09-02 12:08:44 -04:00
Wing Lian
0094a2d744
support for tiledmlp for GPT-OSS ( #3116 )
...
* fix use of flex attn kwargs and add support for tiledmlp for GPT-OSS
* add logging back
* update deps
2025-08-29 13:52:49 -04:00
Wing Lian
7ed40f1d70
automatically set env vars for single gpu deepspeed zero3 ( #3118 ) [skip ci]
...
* automatically set env vars for single gpu deepspeed zero3
* use setdefault
2025-08-29 13:36:47 -04:00
VED
5b6ec2820f
patch for ds_grads_remaining in deepspeed ( #3102 ) [skip ci]
...
* patch deepspeed
* deepspeed patch for ds_grads_remaining
* patch in Patchmanager
* chore: lint
* deepseed utils
* chore2
* patch ds_grads_remaining chore
* chore lint
* chore lint
* remove torch.nn patch
* lint
* Update src/axolotl/monkeypatch/utils.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* patched with checkpointwarapper
* lint
* only apply deepspeed patch when using activation offloading
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-08-29 12:12:09 -04:00
Wing Lian
6afba3871d
Add support for PyTorch 2.8.0 ( #3106 )
...
* Add support for PyTorch 2.8.0
* loosen triton requirements
* handle torch 2.8.0 in setup.py
* fix versions
* no vllm for torch 2.8.0
* remove comment
Co-authored-by: NanoCode012 <nano@axolotl.ai >
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-28 09:10:40 -04:00
Dan Saunders
dc338c3b0e
Update .coderabbit.yaml ( #3109 ) [skip ci]
...
Oops, should be false.
2025-08-27 09:50:52 -04:00
salman
d0d2fc5606
Tokens per second logging [skip-e2e] ( #3072 )
2025-08-27 09:10:14 +01:00
Wing Lian
e1131e9619
make always skip_move_to_device default as true ( #3084 )
2025-08-26 09:30:22 -04:00
Wing Lian
c4c4b90638
add tokenizer_save_jinja_files to keep legacy behavior of including chat template in tokenizer_config.json ( #3093 )
...
* add tokenizer_save_jinja_files to keep legacy behavior of including chat template in tokenizer_config.json
* fix test import
2025-08-26 09:30:04 -04:00
Wing Lian
0e9945e3b9
deploy training jobs to baseten w truss in axolotl cli ( #3086 ) [skip ci]
...
* deploy training jobs to baseten w truss in axolotl cli
* cleanup
2025-08-26 09:29:50 -04:00
NanoCode012
0de254a0d0
feat: add gemma3_text attention handling for lora kernels ( #3103 )
2025-08-26 16:47:26 +07:00
Dan Saunders
79ddaebe9a
Add ruff, remove black, isort, flake8, pylint ( #3092 )
...
* black, isort, flake8 -> ruff
* remove unused
* add back needed import
* fix
2025-08-23 23:37:33 -04:00
Dan Saunders
eea7a006e1
make multipack sampler patch explicit ( #3096 )
...
* make multipack sampler patch explicit
* combining
2025-08-22 14:29:10 -04:00