feat(qwen3-next): Adds targeting of shared expert and attention modules (#3183)

* Adds targetting of shared expert and attention modules in each layer * Update VRAM usage --------- Co-authored-by: Mike Tung <mike@diffbot.com>
2025-09-25 19:06:16 +09:00
parent e8b962d47f
commit 33975ce4bc
2 changed files with 9 additions and 1 deletions
--- a/examples/qwen3-next/README.md
+++ b/examples/qwen3-next/README.md
@@ -38,7 +38,7 @@ pip3 uninstall -y causal-conv1d && pip3 install flash-linear-attention==0.3.2
 axolotl train examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml
 ```

-This config uses about 41.7 GiB VRAM.
+This config uses about 45.62 GiB VRAM.

 Let us know how it goes. Happy finetuning! 🚀