Files
axolotl/examples/qwen3.5
VED c119382337 add: qwen 3.5 (#3442)
* add: qwen 3.5

* test for qwen , patch

* lint

* qwen3 fix on main

* Apply suggestions from code review

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* moe config

* config moe

* configs and chore

* Update examples/qwen3.5/122b-a10b-moe-qlora.yaml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update examples/qwen3.5/35b-a3b-moe-qlora.yaml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* chore for qwen + vlm patch

* chore lint

* qwen lint

* 3_5_moe

* Update examples/qwen3.5/README.md

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2026-03-06 09:31:00 -05:00
..
2026-03-06 09:31:00 -05:00
2026-03-06 09:31:00 -05:00
2026-03-06 09:31:00 -05:00
2026-03-06 09:31:00 -05:00
2026-03-06 09:31:00 -05:00

Finetune Qwen3.5 with Axolotl

Qwen3.5 is a hybrid architecture model series combining Gated DeltaNet linear attention with standard Transformer attention. Models from 7B onwards are early-fusion vision-language models (Qwen3_5ForConditionalGeneration), meaning vision and text tokens are processed through the same transformer stack. The 2B variant is text-only.

Available configs:

Config Model Type
27b-qlora.yaml Qwen3.5-27B Dense VLM, text-only path
35b-a3b-moe-qlora.yaml Qwen3.5-35B-A3B MoE, text-only path
122b-a10b-moe-qlora.yaml Qwen3.5-122B-A10B MoE, text-only path
7b-lora-vision.yaml Qwen3.5-7B Vision+text (multimodal)

Getting started

  1. Install Axolotl following the installation guide.

  2. Install Cut Cross Entropy to reduce training VRAM usage.

  3. Install FLA for sample packing support with the Gated DeltaNet linear attention layers:

pip3 uninstall -y causal-conv1d && pip3 install flash-linear-attention==0.4.1

FLA is required when sample_packing: true. Without it, training raises a RuntimeError on packed sequences. Vision configs use sample_packing: false so FLA is optional there.

  1. Run a finetuning example:
# Dense 27B text-only (QLoRA, ~47 GiB VRAM with sample packing)
axolotl train examples/qwen3.5/27b-qlora.yaml

# MoE 35B-A3B text-only (QLoRA)
axolotl train examples/qwen3.5/35b-a3b-moe-qlora.yaml

# MoE 122B-A10B text-only (QLoRA)
axolotl train examples/qwen3.5/122b-a10b-moe-qlora.yaml

# 7B vision+text (LoRA, multimodal dataset)
axolotl train examples/qwen3.5/7b-lora-vision.yaml

TIPS

  • For inference, you can experiment with temperature: 0.7, top_p: 0.8, top_k: 20, and min_p: 0.
  • You can run a full finetuning by removing adapter: qlora and load_in_4bit: true. See Multi-GPU below.
  • Read more on loading your own dataset at docs.
  • The dataset format follows the OpenAI Messages format as seen here.
  • For multimodal finetuning, set processor_type: AutoProcessor, skip_prepare_dataset: true, and remove_unused_columns: false as shown in 7b-lora-vision.yaml.
  • The Gated DeltaNet linear attention layers (linear_attn.*) can optionally be added to lora_target_modules — they are commented out by default.

Optimization Guides