Dan Saunders
7935dc0911
dtype fix
2025-09-17 18:36:22 -04:00
Dan Saunders
d2b49b2670
error msg
2025-09-17 18:29:30 -04:00
Dan Saunders
b5cb345ca4
fix test
2025-09-17 18:24:00 -04:00
Dan Saunders
03d4c2683e
fix perf degradation
2025-09-17 18:20:37 -04:00
Dan Saunders
fd87eed501
minify
2025-09-17 16:42:35 -04:00
Dan Saunders
129db67705
fix
2025-09-17 16:24:29 -04:00
Dan Saunders
38b890a36b
fix
2025-09-17 16:16:41 -04:00
Dan Saunders
180920c7bf
simplify
2025-09-17 19:49:18 +00:00
Dan Saunders
d024048d74
logs + fix
2025-09-17 14:50:49 -04:00
Dan Saunders
98dc945838
fix
2025-09-17 14:42:53 -04:00
Dan Saunders
108600cd69
update config
2025-09-17 14:36:24 -04:00
Dan Saunders
0e9387c395
fix
2025-09-17 14:35:36 -04:00
Dan Saunders
db61e0d4ff
fix
2025-09-17 14:26:25 -04:00
Dan Saunders
51e565f60a
logs
2025-09-17 14:15:51 -04:00
Dan Saunders
c774dd0409
refactor + fix
2025-09-17 14:01:39 -04:00
Dan Saunders
7289e0cb55
more logs
2025-09-17 13:44:26 -04:00
Dan Saunders
8d483c11f7
more logs
2025-09-17 13:44:26 -04:00
Dan Saunders
9c1829cf57
more logs
2025-09-17 13:44:26 -04:00
Dan Saunders
135b09d1de
logs, qwen2 support
2025-09-17 13:44:26 -04:00
Dan Saunders
de4344a56e
patch
2025-09-17 13:44:26 -04:00
Dan Saunders
7d572b58d1
just grouped_mm for now
2025-09-17 13:44:26 -04:00
Dan Saunders
773d7e4291
update
2025-09-17 13:44:26 -04:00
Dan Saunders
fef47a5b7c
hardening
2025-09-17 13:44:26 -04:00
Dan Saunders
f6ed8ddc01
fix
2025-09-17 13:44:26 -04:00
Dan Saunders
556d6448fe
fix
2025-09-17 13:44:26 -04:00
Dan Saunders
5c2229721d
diag
2025-09-17 13:44:26 -04:00
Dan Saunders
d7de6b0e96
grouped_mm
2025-09-17 13:44:26 -04:00
Dan Saunders
3c6648678f
numerics
2025-09-17 13:44:26 -04:00
Dan Saunders
5b19a1ea9c
improve
2025-09-17 13:44:26 -04:00
Dan Saunders
cfefad1eea
fix
2025-09-17 13:44:26 -04:00
Dan Saunders
125e7b5fe6
fast path
2025-09-17 13:44:26 -04:00
Dan Saunders
479b6144df
tflops
2025-09-17 13:44:26 -04:00
Dan Saunders
68da65cba2
update
2025-09-17 13:44:26 -04:00
Dan Saunders
0d689bb421
cache, example
2025-09-17 13:44:26 -04:00
Dan Saunders
43ada1278a
moe kernels init scaffold
2025-09-17 13:44:26 -04:00
Dan Saunders
4065bc14c6
Debug log, logging improvements ( #3159 )
...
* simplify logging
* remove comment
* progress on debug.log
* add debug-level logger for file log
* simplify
* case insensitivity; 3rd party logging improvements
* simplify
* fix
* tests
* lint
* nits
* nit
* Update tests/test_utils_tee.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* cleanup / comments
* fix
* oops
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-17 13:27:03 -04:00
salman
e5c427f6de
qat doc updates ( #3162 ) [skip-ci]
2025-09-17 10:38:15 +01:00
Wing Lian
86d6ee7c05
upgrade trl and accelerate ( #3161 )
...
* upgrade trl==0.23.0
* upgrade accelerate patch fix
* add hints when using gradient_checkpointing with DPO
* set gradient-checpointing properly
2025-09-16 14:53:01 -04:00
Wing Lian
d4cff1b7bb
improve setting of NCCL_P2P_DISABLE on runpod ( #3132 ) [skip ci]
...
* improve setting of NCCL_P2P_DISABLE on runpod
* use recs from review
2025-09-16 14:52:45 -04:00
Wing Lian
1ef6c196f7
setup env vars for ray train for FSDP ( #3130 ) [skip ci]
2025-09-16 14:52:29 -04:00
salman
58d67bf98d
Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 ( #3107 )
2025-09-12 10:55:50 +01:00
salman
0401a15888
SEO go brrr ( #3153 ) [skip-ci]
2025-09-12 10:55:11 +01:00
NanoCode012
fcfc13d710
feat(doc): update thinking and chat_template notes ( #3114 ) [skip ci]
...
* feat: update thinking and chat_template notes
* fix: grammar
2025-09-12 14:45:18 +07:00
salman
9406c0c488
log before eval step ( #3148 ) [skip-ci]
2025-09-11 11:19:30 +01:00
Dan Saunders
1b53c49e1a
text diffusion training plugin ( #3067 )
...
* diffusion training plugin
* cleanup
* nits
* fixes + improvements
* add back in reinit_weights (clobbered?); masking / pretrain fixes
* nits
* cleanup; tests draft
* sample generation, tests fixes
* fixes
* nits
* add inference support; add auto-mask token support
* nits
* nits
* progress
* simplify logging
* lint
* prefix args with diffusion_
* coderabbito
* tests fix
* nit
* nits
* cleanup + nits
* nits
* fix SFT sample gen
* fixes
* fix
* comments
* comments
* lint
* reward model lora fix
* cleanup; fix pretraining_dataset case
* gradio inference
* update cfgs
* update cfgs
* train, generation parity, cleanup
* fix
* simplify
* test
* test fix
2025-09-10 20:27:00 -04:00
NanoCode012
b71482cec5
Feat: add hunyuan v1 ( #3016 )
...
* feat: add hunyuan cce support
* feat: update cce docs
* feat: add multipack support for granite and hunyuan
* feat: add hunyuan docs and example config
* feat: update readme instructions to include CCE installation
* fix: chat template log appearing despite tokenizer already having template
* feat: add vram usage
* fix: remove duplicate cce install
* fix: use latest commit of PR in case rebased/pushed
* Revert "fix: use latest commit of PR in case rebased/pushed"
This reverts commit 8b60aa00de .
* feat: update doc as upstream merged
2025-09-10 09:03:30 +07:00
NanoCode012
79103b01ca
Feat: add seedoss ( #3104 ) [skip ci]
...
* feat: add seedoss cce
* feat: add seedoss config and docs
* fix: shouldn't have target modules with target linear
* feat: add vram numbers
* fix: hf link
* fix: name
* fix: support multipack seedoss
* fix: merge error
* feat: update seedoss instructions for transformers release
2025-09-10 09:01:02 +07:00
salman
9640338d37
Default include_tkps to true ( #3134 )
...
* default true
* force e2e
* causal trainer only
* fix eval loggin [skip-ci]
* revert setup.py
* force tests
* guarding
* guarding
* fix test case
* use evaluate [skip-e2e]
* use evaluate [skip-e2e]
* kick off ci
* fixing
* reverting
2025-09-09 10:50:21 -04:00
Wing Lian
b5d4c7ff54
allow 1% deviation for codecov ( #3138 ) [skip ci]
2025-09-07 11:01:03 -04:00
Seungduk Kim
8fd9221f13
Add ipo as an rl type that shares DPODataset config ( #3128 )
...
* Add `ipo` as an `rl` type that shares DPODataset config
* chore: lint
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-09-07 10:49:10 -04:00