* Add troubleshooting note for GLM4 GGUF MTP mismatch * Fix JSON syntax for num_nextn_predict_layers example * fix: concise --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>
Finetune Z.ai's GLM-4.5-Air with Axolotl
GLM-4.5-Air is a MoE model by Z.ai.
This guide shows how to fine-tune it with Axolotl.
Getting started
-
Install Axolotl following the installation guide.
-
Install Cut Cross Entropy to reduce training VRAM usage.
-
Run the finetuning example:
# QLoRA (1x80GB @ ~63.4GiB/GPU)
axolotl train examples/glm45/glm-45-air-qlora.yaml
Dataset
In addition to the standard OpenAI Messages format, GLM-4.5 supports an extra parameter for thinking in the assistant section.
{
"role": "assistant",
"reasoning_content": "...", // or have </think>...</think> in `content`
"content": "..."
}
Make sure you set the below extra attributes if needed:
datasets:
- path: ...
type: chat_template
message_property_mappings:
role: role
content: content
# tool_calls: tool_calls # uncomment if using tools
# reasoning_content: reasoning_content # uncomment if have reasoning
# Uncomment if training on tool role (you would rarely if ever need this)
# eot_tokens:
# - <|observation|>
Tips
- The role name for tools in this template is
tool. - You will see this Axolotl WARNING — this is expected as the template does not use EOS:
EOS token '<|endoftext|>' not found in chat_template. Please check if your template/EOS token is correct. - You can run a full finetuning by removing
adapter: qlora,load_in_4bit: true, andquantize_moe_experts: truefrom the config. - LoRA kernels: Incompatible with this model. Must be explicitly disabled (
lora_*_kernel: false). - Read more on how to load your own dataset at docs.
GGUF / llama.cpp loading error (missing tensors)
If you see missing tensor 'blk.X.attn_norm.weight' when loading a GLM-4 / GLM4-MoE model in llama.cpp, this is likely
caused by num_nextn_predict_layers being set to 1 in config.json while the MTP weights were not exported (possible
after PEFT/QLoRA training).
Fix: Set "num_nextn_predict_layers": 0 in your config.json before converting to GGUF.
Optimization Guides
Please check the Optimizations doc.