qwen docs + new config (#3499) [skip ci]

* qwen docs + new config * docss lint * simplify comments * read me * lint comments * Update docs/multimodal.qmd * Update docs/multimodal.qmd * Update examples/qwen3.5/9b-fft-vision.yaml * chore: fix link and incorrect points --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>
2026-03-20 14:43:34 +05:30
parent 7920fe74ec
commit 113d275bd9
5 changed files with 142 additions and 16 deletions
--- a/examples/qwen3.5/README.md
+++ b/examples/qwen3.5/README.md
@@ -1,15 +1,20 @@
 # Finetune Qwen3.5 with Axolotl

-[Qwen3.5](https://huggingface.co/collections/Qwen/qwen35-68452f3bc6e4b7cfb4e1c803) is a hybrid architecture model series combining Gated DeltaNet linear attention with standard Transformer attention. Models from 7B onwards are early-fusion vision-language models (`Qwen3_5ForConditionalGeneration`), meaning vision and text tokens are processed through the same transformer stack. The 2B variant is text-only.
+[Qwen3.5](https://huggingface.co/collections/Qwen/qwen35) is a hybrid architecture model series combining Gated DeltaNet linear attention with standard Transformer attention. All Qwen3.5 models are early-fusion vision-language models: dense variants use `Qwen3_5ForConditionalGeneration` and MoE variants use `Qwen3_5MoeForConditionalGeneration`.
+
+Vision and text tokens are processed through the same transformer stack. The configs below train on text-only data unless noted otherwise. See `9b-lora-vision.yaml` for a multimodal example.

 Available configs:

-| Config | Model | Type |
-|---|---|---|
-| `27b-qlora.yaml` | Qwen3.5-27B | Dense VLM, text-only path |
-| `35b-a3b-moe-qlora.yaml` | Qwen3.5-35B-A3B | MoE, text-only path |
-| `122b-a10b-moe-qlora.yaml` | Qwen3.5-122B-A10B | MoE, text-only path |
-| `7b-lora-vision.yaml` | Qwen3.5-7B | Vision+text (multimodal) |
+| Config | Model | Type | Peak VRAM |
+|---|---|---|---|
+| `27b-qlora.yaml` | Qwen3.5-27B | Dense VLM, text-only QLoRA | ~47 GiB |
+| `27b-fft.yaml` | Qwen3.5-27B | Dense VLM, text-only FFT (vision frozen) | ~53 GiB |
+| `35b-a3b-moe-qlora.yaml` | Qwen3.5-35B-A3B | MoE, text-only QLoRA | — |
+| `122b-a10b-moe-qlora.yaml` | Qwen3.5-122B-A10B | MoE, text-only QLoRA | — |
+| `9b-lora-vision.yaml` | Qwen3.5-9B | Vision+text LoRA, single GPU | — |
+| `9b-fft-vision.yaml` | Qwen3.5-9B | Vision+text FFT, single GPU | ~61 GiB |
+

 ## Getting started

@@ -29,23 +34,31 @@ pip3 uninstall -y causal-conv1d && pip3 install flash-linear-attention==0.4.1
 # Dense 27B text-only (QLoRA, ~47 GiB VRAM with sample packing)
 axolotl train examples/qwen3.5/27b-qlora.yaml

+# Dense 27B text-only FFT with vision encoder frozen (~53 GiB, single 80 GiB GPU)
+axolotl train examples/qwen3.5/27b-fft.yaml
+
 # MoE 35B-A3B text-only (QLoRA)
 axolotl train examples/qwen3.5/35b-a3b-moe-qlora.yaml

 # MoE 122B-A10B text-only (QLoRA)
 axolotl train examples/qwen3.5/122b-a10b-moe-qlora.yaml

-# 7B vision+text (LoRA, multimodal dataset)
-axolotl train examples/qwen3.5/7b-lora-vision.yaml
+# 9B vision+text (LoRA, multimodal dataset)
+axolotl train examples/qwen3.5/9b-lora-vision.yaml
+
+# 9B vision+text FFT, single 80 GiB GPU (~61 GiB peak)
+axolotl train examples/qwen3.5/9b-fft-vision.yaml
+
 ```

 ### TIPS

 - For inference, you can experiment with `temperature: 0.7`, `top_p: 0.8`, `top_k: 20`, and `min_p: 0`.
- You can run a full finetuning by removing `adapter: qlora` and `load_in_4bit: true`. See [Multi-GPU](#optimization-guides) below.
+- For **text-only FFT** on 27B, use `27b-fft.yaml` which sets `unfrozen_parameters` to freeze the vision encoder (`model.visual.*`) — this avoids wasting optimizer state on parameters that receive no gradient from text-only data.
+- You can run a full finetuning of smaller configs by removing `adapter: qlora` and `load_in_4bit: true`. See [Multi-GPU](#optimization-guides) below.
 - Read more on loading your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
 - The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
- For **multimodal** finetuning, set `processor_type: AutoProcessor`, `skip_prepare_dataset: true`, and `remove_unused_columns: false` as shown in `7b-lora-vision.yaml`.
+- For **multimodal** finetuning, set `processor_type: AutoProcessor`, `skip_prepare_dataset: true`, and `remove_unused_columns: false` as shown in `9b-lora-vision.yaml`.
 - The Gated DeltaNet linear attention layers (`linear_attn.*`) can optionally be added to `lora_target_modules` — they are commented out by default.

 ## Optimization Guides