NanoCode012
|
e672d37f33
|
fix: qwen3-next to use fla causal-conv1d to support packing (#3437
* fix: qwen3-next to use fla causal-conv1d to support packing
* fix: causal import and update doc for v5
* fix: hard fail for packing without fla
|
2026-03-03 09:26:46 -05:00 |
|
Wing Lian
|
a531e9d946
|
upgrade vllm to v0.14.0 (#3345)
|
2026-01-21 20:00:18 -05:00 |
|
miketung
|
33975ce4bc
|
feat(qwen3-next): Adds targeting of shared expert and attention modules (#3183)
* Adds targetting of shared expert and attention modules in each layer
* Update VRAM usage
---------
Co-authored-by: Mike Tung <mike@diffbot.com>
|
2025-09-25 17:06:16 +07:00 |
|
NanoCode012
|
08d831c3d5
|
Feat: add qwen3-next (w packing+cce) (#3150)
* feat: upgrade cce for qwen3-next
* feat: add sample qwen3 config
* feat: add packing patch for chunk_gated_delta_rule
* feat: add qwen3 link
* fix: tuple name
* feat: add tested qwen3 config
* fix: improve log
* feat: add patch for fla without packing
* fix: remove fla patch for standard mode
* feat: enable packing
* feat: add qwen3-next tests
* chore: move tests
|
2025-09-23 11:31:15 +07:00 |
|