* qwen docs + new config * docss lint * simplify comments * read me * lint comments * Update docs/multimodal.qmd * Update docs/multimodal.qmd * Update examples/qwen3.5/9b-fft-vision.yaml * chore: fix link and incorrect points --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>
Finetune Qwen3.5 with Axolotl
Qwen3.5 is a hybrid architecture model series combining Gated DeltaNet linear attention with standard Transformer attention. All Qwen3.5 models are early-fusion vision-language models: dense variants use Qwen3_5ForConditionalGeneration and MoE variants use Qwen3_5MoeForConditionalGeneration.
Vision and text tokens are processed through the same transformer stack. The configs below train on text-only data unless noted otherwise. See 9b-lora-vision.yaml for a multimodal example.
Available configs:
| Config | Model | Type | Peak VRAM |
|---|---|---|---|
27b-qlora.yaml |
Qwen3.5-27B | Dense VLM, text-only QLoRA | ~47 GiB |
27b-fft.yaml |
Qwen3.5-27B | Dense VLM, text-only FFT (vision frozen) | ~53 GiB |
35b-a3b-moe-qlora.yaml |
Qwen3.5-35B-A3B | MoE, text-only QLoRA | — |
122b-a10b-moe-qlora.yaml |
Qwen3.5-122B-A10B | MoE, text-only QLoRA | — |
9b-lora-vision.yaml |
Qwen3.5-9B | Vision+text LoRA, single GPU | — |
9b-fft-vision.yaml |
Qwen3.5-9B | Vision+text FFT, single GPU | ~61 GiB |
Getting started
-
Install Axolotl following the installation guide.
-
Install Cut Cross Entropy to reduce training VRAM usage.
-
Install FLA for sample packing support with the Gated DeltaNet linear attention layers:
pip3 uninstall -y causal-conv1d && pip3 install flash-linear-attention==0.4.1
FLA is required when
sample_packing: true. Without it, training raises aRuntimeErroron packed sequences. Vision configs usesample_packing: falseso FLA is optional there.
- Run a finetuning example:
# Dense 27B text-only (QLoRA, ~47 GiB VRAM with sample packing)
axolotl train examples/qwen3.5/27b-qlora.yaml
# Dense 27B text-only FFT with vision encoder frozen (~53 GiB, single 80 GiB GPU)
axolotl train examples/qwen3.5/27b-fft.yaml
# MoE 35B-A3B text-only (QLoRA)
axolotl train examples/qwen3.5/35b-a3b-moe-qlora.yaml
# MoE 122B-A10B text-only (QLoRA)
axolotl train examples/qwen3.5/122b-a10b-moe-qlora.yaml
# 9B vision+text (LoRA, multimodal dataset)
axolotl train examples/qwen3.5/9b-lora-vision.yaml
# 9B vision+text FFT, single 80 GiB GPU (~61 GiB peak)
axolotl train examples/qwen3.5/9b-fft-vision.yaml
TIPS
- For inference, you can experiment with
temperature: 0.7,top_p: 0.8,top_k: 20, andmin_p: 0. - For text-only FFT on 27B, use
27b-fft.yamlwhich setsunfrozen_parametersto freeze the vision encoder (model.visual.*) — this avoids wasting optimizer state on parameters that receive no gradient from text-only data. - You can run a full finetuning of smaller configs by removing
adapter: qloraandload_in_4bit: true. See Multi-GPU below. - Read more on loading your own dataset at docs.
- The dataset format follows the OpenAI Messages format as seen here.
- For multimodal finetuning, set
processor_type: AutoProcessor,skip_prepare_dataset: true, andremove_unused_columns: falseas shown in9b-lora-vision.yaml. - The Gated DeltaNet linear attention layers (
linear_attn.*) can optionally be added tolora_target_modules— they are commented out by default.