feat(qwen3-next): Adds targeting of shared expert and attention modules (#3183)
* Adds targetting of shared expert and attention modules in each layer * Update VRAM usage --------- Co-authored-by: Mike Tung <mike@diffbot.com>
This commit is contained in:
@@ -38,7 +38,7 @@ pip3 uninstall -y causal-conv1d && pip3 install flash-linear-attention==0.3.2
|
|||||||
axolotl train examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml
|
axolotl train examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
This config uses about 41.7 GiB VRAM.
|
This config uses about 45.62 GiB VRAM.
|
||||||
|
|
||||||
Let us know how it goes. Happy finetuning! 🚀
|
Let us know how it goes. Happy finetuning! 🚀
|
||||||
|
|
||||||
|
|||||||
@@ -27,6 +27,14 @@ lora_r: 16
|
|||||||
lora_alpha: 8
|
lora_alpha: 8
|
||||||
lora_dropout: 0.05
|
lora_dropout: 0.05
|
||||||
lora_target_modules:
|
lora_target_modules:
|
||||||
|
- linear_attn.in_proj_ba
|
||||||
|
- linear_attn.in_proj_qkvz
|
||||||
|
- linear_attn.out_proj
|
||||||
|
- shared_expert.up_proj
|
||||||
|
- shared_expert.down_proj
|
||||||
|
- shared_expert.gate_proj
|
||||||
|
- shared_expert_gate
|
||||||
|
- mlp.gate
|
||||||
- q_proj
|
- q_proj
|
||||||
- v_proj
|
- v_proj
|
||||||
- k_proj
|
- k_proj
|
||||||
|
|||||||
Reference in New Issue
Block a user