Dan Saunders
|
1fa0a98e38
|
update lock
|
2025-09-26 15:44:46 +00:00 |
|
Dan Saunders
|
8d542d9d63
|
deps up to date
|
2025-09-26 10:39:34 -04:00 |
|
Dan Saunders
|
a4565476e0
|
find-links for wheels, auto-gptq -> gptqmodel
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
02dc263338
|
updates
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
2acd3e1242
|
dep
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
0437c1a4ba
|
auto-gptq -> gptqmodel
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
ef150fd973
|
updates
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
47ad92c6b9
|
fix
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
f0fee9c56c
|
req
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
37d07bd7f7
|
coderabbito, improvements
|
2025-09-26 10:26:44 -04:00 |
|
Dan Saunders
|
4c81172917
|
coderabbito
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
cd8c769e84
|
Update cicd/Dockerfile.jinja
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
0d60046d08
|
Update .github/workflows/pypi.yml
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
c110e3eb48
|
remove setup.py, requirements.txt and refs
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
95c259b3fb
|
depr warning
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
d1fd505813
|
update
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
1334281d50
|
docker fix
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
98f230d864
|
cleanup
|
2025-09-26 10:26:21 -04:00 |
|
Dan Saunders
|
02f308351c
|
fix
|
2025-09-26 10:25:58 -04:00 |
|
Dan Saunders
|
3b91e8174d
|
fix
|
2025-09-26 10:25:58 -04:00 |
|
Dan Saunders
|
40d906fb33
|
lint
|
2025-09-26 10:25:58 -04:00 |
|
Dan Saunders
|
89d5323c13
|
fix
|
2025-09-26 10:25:58 -04:00 |
|
Dan Saunders
|
df870f6a8f
|
fix
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
f500aaa490
|
fix
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
9ec33f52e3
|
wip
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
b453562c01
|
fixes
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
367f7eb3a6
|
fix
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
e888e38ce7
|
fix
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
400120af2d
|
wip
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
459e5f9b16
|
lint
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
43f6f84269
|
wip
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
36c4ab11f9
|
wip
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
2f4e4ef604
|
wip
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
aee03fc636
|
wip
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
255b818fbc
|
rebase
|
2025-09-26 10:24:59 -04:00 |
|
Dan Saunders
|
332ee74f32
|
rebase
|
2025-09-26 10:24:07 -04:00 |
|
Dan Saunders
|
3b0d2ac5c0
|
rebase
|
2025-09-26 10:21:49 -04:00 |
|
Dan Saunders
|
9462a1bf79
|
wip
|
2025-09-26 10:21:49 -04:00 |
|
Dan Saunders
|
8e9386c799
|
go uv first
|
2025-09-26 09:57:09 -04:00 |
|
Dan Saunders
|
740d5a1d31
|
doc fix (#3187)
|
2025-09-26 09:55:15 -04:00 |
|
Grant Holmes (Ren)
|
850c1a5f8d
|
Add FSDP v2 swap memory support + QLoRA compatibility fixes (#3167)
Co-authored-by: salman <salman.mohammadi@outlook.com>
|
2025-09-26 10:23:59 +01:00 |
|
NanoCode012
|
7fa8ac40cd
|
Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches (#3178)
* feat: upgrade cce with patches for transformers 4.56
* feat: add missing models to cce readme
|
2025-09-26 12:11:29 +07:00 |
|
Dan Saunders
|
f9748c4dc5
|
Cp fix (#3182)
* patch transformers to allow CP + FA2
* nits
* only patch in CP > 1 case
|
2025-09-25 12:03:50 -04:00 |
|
miketung
|
33975ce4bc
|
feat(qwen3-next): Adds targeting of shared expert and attention modules (#3183)
* Adds targetting of shared expert and attention modules in each layer
* Update VRAM usage
---------
Co-authored-by: Mike Tung <mike@diffbot.com>
|
2025-09-25 17:06:16 +07:00 |
|
陈华杰
|
e8b962d47f
|
feat: support training with JSON string tool arguments (#3136)
* feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error
* feat: raise error for tool call arguments decode
* Add test_chat_templates_tool_call_string_arguments.py
Add test for string arguments
* fix: change to correct qwen3 tokenizer
* fix: update docs to clarify arguments json
* chore: lint
* fix: duplicate
* chore: revert
* feat: add error to faq
* fix: remove duplicate fixture
---------
Co-authored-by: caoqinping <caoqinping@lixiang.com>
Co-authored-by: gamersover-blog <1611885128@qq.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
|
2025-09-25 12:06:21 +07:00 |
|
NanoCode012
|
856ff12171
|
feat(doc): add optimizations table of content to our improvements (#3175) [skip ci]
* chore: format
* feat: add usage for alst
* chore: wording
* feat: add optimizations doc
* Apply suggestion from @SalmanMohammadi
Co-authored-by: salman <salman.mohammadi@outlook.com>
* Update docs/dataset-formats/index.qmd
Co-authored-by: salman <salman.mohammadi@outlook.com>
* feat: add alst, act offloading, nd parallelism, use relative links, and fix format
* chore: comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com>
|
2025-09-24 16:13:49 -04:00 |
|
Dan Saunders
|
6bc959342b
|
remove unused dep (#3180)
|
2025-09-24 13:18:44 -04:00 |
|
NanoCode012
|
b3b92687c4
|
chore: rename gemma3 270m config (#3174)
|
2025-09-24 13:48:38 +07:00 |
|
NanoCode012
|
55d1be2ae6
|
fix: unify default for conversations_field [skip-e2e] (#3070)
* fix: unify default for conversations_field
* fix: suggestion to remove defaults
|
2025-09-23 21:22:15 +07:00 |
|
NanoCode012
|
08d831c3d5
|
Feat: add qwen3-next (w packing+cce) (#3150)
* feat: upgrade cce for qwen3-next
* feat: add sample qwen3 config
* feat: add packing patch for chunk_gated_delta_rule
* feat: add qwen3 link
* fix: tuple name
* feat: add tested qwen3 config
* fix: improve log
* feat: add patch for fla without packing
* fix: remove fla patch for standard mode
* feat: enable packing
* feat: add qwen3-next tests
* chore: move tests
|
2025-09-23 11:31:15 +07:00 |
|