Files

NanoCode012 753906cfc7 feat: add doc for expert quantization, glm45 air example configs, and update readme for release (#3452 ) [skip ci]

* chore: rename without period

* feat: add glm45 air

* feat: add doc on expert quantization

* feat: update base readme with new changes

* chore: cleanup

* chore: cleanup

* chore: cleanup

* fix: disable quantize_moe_expert on merge per comment

* chore: add kernel info to optimizations doc

2026-03-05 09:58:09 -05:00

glm-45-air-qlora.yaml

feat: add doc for expert quantization, glm45 air example configs, and update readme for release (#3452 ) [skip ci]

2026-03-05 09:58:09 -05:00

README.md

feat: add doc for expert quantization, glm45 air example configs, and update readme for release (#3452 ) [skip ci]

2026-03-05 09:58:09 -05:00

README.md

Finetune Z.ai's GLM-4.5-Air with Axolotl

GLM-4.5-Air is a MoE model by Z.ai.

This guide shows how to fine-tune it with Axolotl.

Getting started

Install Axolotl following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage.
Run the finetuning example:

# QLoRA (1x80GB @ ~63.4GiB/GPU)
axolotl train examples/glm45/glm-45-air-qlora.yaml

Dataset

In addition to the standard OpenAI Messages format, GLM-4.5 supports an extra parameter for thinking in the assistant section.

{
    "role": "assistant",
    "reasoning_content": "...",  // or have </think>...</think> in `content`
    "content": "..."
}

Make sure you set the below extra attributes if needed:

datasets:
  - path: ...
    type: chat_template
    message_property_mappings:
      role: role
      content: content

    #   tool_calls: tool_calls  # uncomment if using tools
    #   reasoning_content: reasoning_content  # uncomment if have reasoning

# Uncomment if training on tool role (you would rarely if ever need this)
# eot_tokens:
#   - <|observation|>

Tips

The role name for tools in this template is tool.

You will see this Axolotl WARNING — this is expected as the template does not use EOS:

EOS token '<|endoftext|>' not found in chat_template. Please check if your template/EOS token is correct.

You can run a full finetuning by removing adapter: qlora, load_in_4bit: true, and quantize_moe_experts: true from the config.
LoRA kernels: Incompatible with this model. Must be explicitly disabled (lora_*_kernel: false).
Read more on how to load your own dataset at docs.

Optimization Guides

Please check the Optimizations doc.

README.md

Finetune Z.ai's GLM-4.5-Air with Axolotl

Getting started

Dataset

Tips

Optimization Guides

Related Resources