Dan Saunders
0d60046d08
Update .github/workflows/pypi.yml
...
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-26 10:26:21 -04:00
Dan Saunders
c110e3eb48
remove setup.py, requirements.txt and refs
2025-09-26 10:26:21 -04:00
Dan Saunders
95c259b3fb
depr warning
2025-09-26 10:26:21 -04:00
Dan Saunders
d1fd505813
update
2025-09-26 10:26:21 -04:00
Dan Saunders
1334281d50
docker fix
2025-09-26 10:26:21 -04:00
Dan Saunders
98f230d864
cleanup
2025-09-26 10:26:21 -04:00
Dan Saunders
02f308351c
fix
2025-09-26 10:25:58 -04:00
Dan Saunders
3b91e8174d
fix
2025-09-26 10:25:58 -04:00
Dan Saunders
40d906fb33
lint
2025-09-26 10:25:58 -04:00
Dan Saunders
89d5323c13
fix
2025-09-26 10:25:58 -04:00
Dan Saunders
df870f6a8f
fix
2025-09-26 10:24:59 -04:00
Dan Saunders
f500aaa490
fix
2025-09-26 10:24:59 -04:00
Dan Saunders
9ec33f52e3
wip
2025-09-26 10:24:59 -04:00
Dan Saunders
b453562c01
fixes
2025-09-26 10:24:59 -04:00
Dan Saunders
367f7eb3a6
fix
2025-09-26 10:24:59 -04:00
Dan Saunders
e888e38ce7
fix
2025-09-26 10:24:59 -04:00
Dan Saunders
400120af2d
wip
2025-09-26 10:24:59 -04:00
Dan Saunders
459e5f9b16
lint
2025-09-26 10:24:59 -04:00
Dan Saunders
43f6f84269
wip
2025-09-26 10:24:59 -04:00
Dan Saunders
36c4ab11f9
wip
2025-09-26 10:24:59 -04:00
Dan Saunders
2f4e4ef604
wip
2025-09-26 10:24:59 -04:00
Dan Saunders
aee03fc636
wip
2025-09-26 10:24:59 -04:00
Dan Saunders
255b818fbc
rebase
2025-09-26 10:24:59 -04:00
Dan Saunders
332ee74f32
rebase
2025-09-26 10:24:07 -04:00
Dan Saunders
3b0d2ac5c0
rebase
2025-09-26 10:21:49 -04:00
Dan Saunders
9462a1bf79
wip
2025-09-26 10:21:49 -04:00
Dan Saunders
8e9386c799
go uv first
2025-09-26 09:57:09 -04:00
Dan Saunders
740d5a1d31
doc fix ( #3187 )
2025-09-26 09:55:15 -04:00
Grant Holmes (Ren)
850c1a5f8d
Add FSDP v2 swap memory support + QLoRA compatibility fixes ( #3167 )
...
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-26 10:23:59 +01:00
NanoCode012
7fa8ac40cd
Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches ( #3178 )
...
* feat: upgrade cce with patches for transformers 4.56
* feat: add missing models to cce readme
2025-09-26 12:11:29 +07:00
Dan Saunders
f9748c4dc5
Cp fix ( #3182 )
...
* patch transformers to allow CP + FA2
* nits
* only patch in CP > 1 case
2025-09-25 12:03:50 -04:00
miketung
33975ce4bc
feat(qwen3-next): Adds targeting of shared expert and attention modules ( #3183 )
...
* Adds targetting of shared expert and attention modules in each layer
* Update VRAM usage
---------
Co-authored-by: Mike Tung <mike@diffbot.com >
2025-09-25 17:06:16 +07:00
陈华杰
e8b962d47f
feat: support training with JSON string tool arguments ( #3136 )
...
* feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error
* feat: raise error for tool call arguments decode
* Add test_chat_templates_tool_call_string_arguments.py
Add test for string arguments
* fix: change to correct qwen3 tokenizer
* fix: update docs to clarify arguments json
* chore: lint
* fix: duplicate
* chore: revert
* feat: add error to faq
* fix: remove duplicate fixture
---------
Co-authored-by: caoqinping <caoqinping@lixiang.com >
Co-authored-by: gamersover-blog <1611885128@qq.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-25 12:06:21 +07:00
NanoCode012
856ff12171
feat(doc): add optimizations table of content to our improvements ( #3175 ) [skip ci]
...
* chore: format
* feat: add usage for alst
* chore: wording
* feat: add optimizations doc
* Apply suggestion from @SalmanMohammadi
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update docs/dataset-formats/index.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com >
* feat: add alst, act offloading, nd parallelism, use relative links, and fix format
* chore: comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-24 16:13:49 -04:00
Dan Saunders
6bc959342b
remove unused dep ( #3180 )
2025-09-24 13:18:44 -04:00
NanoCode012
b3b92687c4
chore: rename gemma3 270m config ( #3174 )
2025-09-24 13:48:38 +07:00
NanoCode012
55d1be2ae6
fix: unify default for conversations_field [skip-e2e] ( #3070 )
...
* fix: unify default for conversations_field
* fix: suggestion to remove defaults
2025-09-23 21:22:15 +07:00
NanoCode012
08d831c3d5
Feat: add qwen3-next (w packing+cce) ( #3150 )
...
* feat: upgrade cce for qwen3-next
* feat: add sample qwen3 config
* feat: add packing patch for chunk_gated_delta_rule
* feat: add qwen3 link
* fix: tuple name
* feat: add tested qwen3 config
* fix: improve log
* feat: add patch for fla without packing
* fix: remove fla patch for standard mode
* feat: enable packing
* feat: add qwen3-next tests
* chore: move tests
2025-09-23 11:31:15 +07:00
AlexHT Hung
7be8740c5c
fix(rl): pass max_prompt_len to training args as max_prompt_length ( #3113 )
...
* pass max_prompt_len to training args as max_prompt_length
* Update rl.py
* refactor
* format
* fix: default for max_prompt_length
* fix: defaults for trainer
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-19 17:34:28 +07:00
NanoCode012
c51d6b06c3
feat: add apertus model and cce ( #3144 ) [skip ci]
...
* feat: add apertus, glm4v, glm4v_moe cce
* fix: arcee docs
* feat: add apertus
* feat: added vram usage
* fix: add apertus note
* feat: update doc on apertus xielu
* fix: add monkeypatch for xielu activation issue
* fix: simplify env
* feat: pin commit
* feat: add packing
* chore: move patch calling
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-19 17:34:04 +07:00
NanoCode012
09959fac70
Feat: add Magistral Small 2509 and native mistral3 tokenizer support ( #3165 )
...
* feat: update mistral common
* feat: add mistral3processor
* fix: loading
* fix: cast pixel_values to fp32
* fix: image tensor conversion
* feat: add FA2 support for pixtral based models
* fix: update mistral small 3.1 to use native tokenizer
* fix: install tips
* fix: improve info on sample dataset files
* chore: move mistral configs into subfolders
* fix: remove unneeded patch
* fix: indent
* feat: add integration tests
* chore: move
* feat: add magistral 2509 docs and example
* fix: convert tensor to bool
* feat: expand tests
* chore: move tests
2025-09-18 15:42:20 +07:00
Dan Saunders
4065bc14c6
Debug log, logging improvements ( #3159 )
...
* simplify logging
* remove comment
* progress on debug.log
* add debug-level logger for file log
* simplify
* case insensitivity; 3rd party logging improvements
* simplify
* fix
* tests
* lint
* nits
* nit
* Update tests/test_utils_tee.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* cleanup / comments
* fix
* oops
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-17 13:27:03 -04:00
salman
e5c427f6de
qat doc updates ( #3162 ) [skip-ci]
2025-09-17 10:38:15 +01:00
Wing Lian
86d6ee7c05
upgrade trl and accelerate ( #3161 )
...
* upgrade trl==0.23.0
* upgrade accelerate patch fix
* add hints when using gradient_checkpointing with DPO
* set gradient-checpointing properly
2025-09-16 14:53:01 -04:00
Wing Lian
d4cff1b7bb
improve setting of NCCL_P2P_DISABLE on runpod ( #3132 ) [skip ci]
...
* improve setting of NCCL_P2P_DISABLE on runpod
* use recs from review
2025-09-16 14:52:45 -04:00
Wing Lian
1ef6c196f7
setup env vars for ray train for FSDP ( #3130 ) [skip ci]
2025-09-16 14:52:29 -04:00
salman
58d67bf98d
Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 ( #3107 )
2025-09-12 10:55:50 +01:00
salman
0401a15888
SEO go brrr ( #3153 ) [skip-ci]
2025-09-12 10:55:11 +01:00
NanoCode012
fcfc13d710
feat(doc): update thinking and chat_template notes ( #3114 ) [skip ci]
...
* feat: update thinking and chat_template notes
* fix: grammar
2025-09-12 14:45:18 +07:00
salman
9406c0c488
log before eval step ( #3148 ) [skip-ci]
2025-09-11 11:19:30 +01:00