Files

VED c119382337 add: qwen 3.5 (#3442 )

* add: qwen 3.5

* test for qwen , patch

* lint

* qwen3 fix on main

* Apply suggestions from code review

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* moe config

* config moe

* configs and chore

* Update examples/qwen3.5/122b-a10b-moe-qlora.yaml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update examples/qwen3.5/35b-a3b-moe-qlora.yaml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* chore for qwen + vlm patch

* chore lint

* qwen lint

* 3_5_moe

* Update examples/qwen3.5/README.md

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

2026-03-06 09:31:00 -05:00

7b-lora-vision.yaml

add: qwen 3.5 (#3442 )

2026-03-06 09:31:00 -05:00

27b-qlora.yaml

add: qwen 3.5 (#3442 )

2026-03-06 09:31:00 -05:00

35b-a3b-moe-qlora.yaml

add: qwen 3.5 (#3442 )

2026-03-06 09:31:00 -05:00

122b-a10b-moe-qlora.yaml

add: qwen 3.5 (#3442 )

2026-03-06 09:31:00 -05:00

README.md

add: qwen 3.5 (#3442 )

2026-03-06 09:31:00 -05:00

README.md

Finetune Qwen3.5 with Axolotl

Qwen3.5 is a hybrid architecture model series combining Gated DeltaNet linear attention with standard Transformer attention. Models from 7B onwards are early-fusion vision-language models (Qwen3_5ForConditionalGeneration), meaning vision and text tokens are processed through the same transformer stack. The 2B variant is text-only.

Available configs:

Config	Model	Type
`27b-qlora.yaml`	Qwen3.5-27B	Dense VLM, text-only path
`35b-a3b-moe-qlora.yaml`	Qwen3.5-35B-A3B	MoE, text-only path
`122b-a10b-moe-qlora.yaml`	Qwen3.5-122B-A10B	MoE, text-only path
`7b-lora-vision.yaml`	Qwen3.5-7B	Vision+text (multimodal)

Getting started

Install Axolotl following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage.
Install FLA for sample packing support with the Gated DeltaNet linear attention layers:

pip3 uninstall -y causal-conv1d && pip3 install flash-linear-attention==0.4.1

FLA is required when sample_packing: true. Without it, training raises a RuntimeError on packed sequences. Vision configs use sample_packing: false so FLA is optional there.

Run a finetuning example:

# Dense 27B text-only (QLoRA, ~47 GiB VRAM with sample packing)
axolotl train examples/qwen3.5/27b-qlora.yaml

# MoE 35B-A3B text-only (QLoRA)
axolotl train examples/qwen3.5/35b-a3b-moe-qlora.yaml

# MoE 122B-A10B text-only (QLoRA)
axolotl train examples/qwen3.5/122b-a10b-moe-qlora.yaml

# 7B vision+text (LoRA, multimodal dataset)
axolotl train examples/qwen3.5/7b-lora-vision.yaml

TIPS

For inference, you can experiment with temperature: 0.7, top_p: 0.8, top_k: 20, and min_p: 0.
You can run a full finetuning by removing adapter: qlora and load_in_4bit: true. See Multi-GPU below.
Read more on loading your own dataset at docs.
The dataset format follows the OpenAI Messages format as seen here.
For multimodal finetuning, set processor_type: AutoProcessor, skip_prepare_dataset: true, and remove_unused_columns: false as shown in 7b-lora-vision.yaml.
The Gated DeltaNet linear attention layers (linear_attn.*) can optionally be added to lora_target_modules — they are commented out by default.

Optimization Guides

Optimizations Guide

README.md

Finetune Qwen3.5 with Axolotl

Getting started

TIPS

Optimization Guides

Related Resources