Dan Saunders
e003a05177
narrow sweep; compare both backends
2025-09-25 14:54:03 -04:00
Dan Saunders
91393c4dc8
allocator
2025-09-25 14:27:34 -04:00
Dan Saunders
d578c53603
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
4db7a21ff7
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
3b2e05c563
update to new api
2025-09-25 14:27:34 -04:00
Dan Saunders
1037ca3a97
update to new api
2025-09-25 14:27:34 -04:00
Dan Saunders
6369dcd7b8
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
a81612305c
fix?
2025-09-25 14:27:34 -04:00
Dan Saunders
d0da67eb17
add mg kernel backend
2025-09-25 14:27:34 -04:00
Dan Saunders
8a1f5ae940
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
146ca48cba
vram
2025-09-25 14:27:34 -04:00
Dan Saunders
fd312f6058
dtype
2025-09-25 14:27:34 -04:00
Dan Saunders
ab8fa56b16
dtype
2025-09-25 14:27:34 -04:00
Dan Saunders
1640cd4006
delete config
2025-09-25 14:27:34 -04:00
Dan Saunders
3277d44d71
cfg value
2025-09-25 14:27:34 -04:00
Dan Saunders
d3e1b0ef1a
small deepseek script
2025-09-25 14:27:34 -04:00
Dan Saunders
5b97633faa
Fix
2025-09-25 14:27:34 -04:00
Dan Saunders
94cbc6d42d
log device, dtype
2025-09-25 14:27:34 -04:00
Dan Saunders
493616fc3d
reprod tt table
2025-09-25 14:27:34 -04:00
Dan Saunders
d2b25c7327
grid sweep
2025-09-25 14:27:34 -04:00
Dan Saunders
b670c45276
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
61faf4cbe4
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
8d8fa834a2
sweep
2025-09-25 14:27:34 -04:00
Dan Saunders
9d69c6fb3e
Fix
2025-09-25 14:27:34 -04:00
Dan Saunders
92f2f6e73c
dtype fix
2025-09-25 14:27:34 -04:00
Dan Saunders
e5d2aebe16
uniform routing:
2025-09-25 14:27:34 -04:00
Dan Saunders
4ab9e3f58b
add logs
2025-09-25 14:27:34 -04:00
Dan Saunders
5788832812
simplify
2025-09-25 14:27:34 -04:00
Dan Saunders
db782430f8
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
5c74edeefe
token shuffle kernel
2025-09-25 14:27:34 -04:00
Dan Saunders
18269ee6a9
fix
2025-09-25 14:27:34 -04:00
Dan Saunders
6a45d804f9
glue
2025-09-25 14:27:34 -04:00
Dan Saunders
95e607574a
vendor torchtitan moe kernels
2025-09-25 14:27:34 -04:00
Dan Saunders
f9748c4dc5
Cp fix ( #3182 )
...
* patch transformers to allow CP + FA2
* nits
* only patch in CP > 1 case
2025-09-25 12:03:50 -04:00
miketung
33975ce4bc
feat(qwen3-next): Adds targeting of shared expert and attention modules ( #3183 )
...
* Adds targetting of shared expert and attention modules in each layer
* Update VRAM usage
---------
Co-authored-by: Mike Tung <mike@diffbot.com >
2025-09-25 17:06:16 +07:00
陈华杰
e8b962d47f
feat: support training with JSON string tool arguments ( #3136 )
...
* feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error
* feat: raise error for tool call arguments decode
* Add test_chat_templates_tool_call_string_arguments.py
Add test for string arguments
* fix: change to correct qwen3 tokenizer
* fix: update docs to clarify arguments json
* chore: lint
* fix: duplicate
* chore: revert
* feat: add error to faq
* fix: remove duplicate fixture
---------
Co-authored-by: caoqinping <caoqinping@lixiang.com >
Co-authored-by: gamersover-blog <1611885128@qq.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-25 12:06:21 +07:00
NanoCode012
856ff12171
feat(doc): add optimizations table of content to our improvements ( #3175 ) [skip ci]
...
* chore: format
* feat: add usage for alst
* chore: wording
* feat: add optimizations doc
* Apply suggestion from @SalmanMohammadi
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update docs/dataset-formats/index.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com >
* feat: add alst, act offloading, nd parallelism, use relative links, and fix format
* chore: comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-24 16:13:49 -04:00
Dan Saunders
6bc959342b
remove unused dep ( #3180 )
2025-09-24 13:18:44 -04:00
NanoCode012
b3b92687c4
chore: rename gemma3 270m config ( #3174 )
2025-09-24 13:48:38 +07:00
NanoCode012
55d1be2ae6
fix: unify default for conversations_field [skip-e2e] ( #3070 )
...
* fix: unify default for conversations_field
* fix: suggestion to remove defaults
2025-09-23 21:22:15 +07:00
NanoCode012
08d831c3d5
Feat: add qwen3-next (w packing+cce) ( #3150 )
...
* feat: upgrade cce for qwen3-next
* feat: add sample qwen3 config
* feat: add packing patch for chunk_gated_delta_rule
* feat: add qwen3 link
* fix: tuple name
* feat: add tested qwen3 config
* fix: improve log
* feat: add patch for fla without packing
* fix: remove fla patch for standard mode
* feat: enable packing
* feat: add qwen3-next tests
* chore: move tests
2025-09-23 11:31:15 +07:00
AlexHT Hung
7be8740c5c
fix(rl): pass max_prompt_len to training args as max_prompt_length ( #3113 )
...
* pass max_prompt_len to training args as max_prompt_length
* Update rl.py
* refactor
* format
* fix: default for max_prompt_length
* fix: defaults for trainer
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-09-19 17:34:28 +07:00
NanoCode012
c51d6b06c3
feat: add apertus model and cce ( #3144 ) [skip ci]
...
* feat: add apertus, glm4v, glm4v_moe cce
* fix: arcee docs
* feat: add apertus
* feat: added vram usage
* fix: add apertus note
* feat: update doc on apertus xielu
* fix: add monkeypatch for xielu activation issue
* fix: simplify env
* feat: pin commit
* feat: add packing
* chore: move patch calling
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-19 17:34:04 +07:00
NanoCode012
09959fac70
Feat: add Magistral Small 2509 and native mistral3 tokenizer support ( #3165 )
...
* feat: update mistral common
* feat: add mistral3processor
* fix: loading
* fix: cast pixel_values to fp32
* fix: image tensor conversion
* feat: add FA2 support for pixtral based models
* fix: update mistral small 3.1 to use native tokenizer
* fix: install tips
* fix: improve info on sample dataset files
* chore: move mistral configs into subfolders
* fix: remove unneeded patch
* fix: indent
* feat: add integration tests
* chore: move
* feat: add magistral 2509 docs and example
* fix: convert tensor to bool
* feat: expand tests
* chore: move tests
2025-09-18 15:42:20 +07:00
Dan Saunders
4065bc14c6
Debug log, logging improvements ( #3159 )
...
* simplify logging
* remove comment
* progress on debug.log
* add debug-level logger for file log
* simplify
* case insensitivity; 3rd party logging improvements
* simplify
* fix
* tests
* lint
* nits
* nit
* Update tests/test_utils_tee.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* cleanup / comments
* fix
* oops
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-17 13:27:03 -04:00
salman
e5c427f6de
qat doc updates ( #3162 ) [skip-ci]
2025-09-17 10:38:15 +01:00
Wing Lian
86d6ee7c05
upgrade trl and accelerate ( #3161 )
...
* upgrade trl==0.23.0
* upgrade accelerate patch fix
* add hints when using gradient_checkpointing with DPO
* set gradient-checpointing properly
2025-09-16 14:53:01 -04:00
Wing Lian
d4cff1b7bb
improve setting of NCCL_P2P_DISABLE on runpod ( #3132 ) [skip ci]
...
* improve setting of NCCL_P2P_DISABLE on runpod
* use recs from review
2025-09-16 14:52:45 -04:00
Wing Lian
1ef6c196f7
setup env vars for ray train for FSDP ( #3130 ) [skip ci]
2025-09-16 14:52:29 -04:00
salman
58d67bf98d
Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 ( #3107 )
2025-09-12 10:55:50 +01:00