Files
axolotl/examples/glm47-flash/README.md
NanoCode012 753906cfc7 feat: add doc for expert quantization, glm45 air example configs, and update readme for release (#3452) [skip ci]
* chore: rename without period

* feat: add glm45 air

* feat: add doc on expert quantization

* feat: update base readme with new changes

* chore: cleanup

* chore: cleanup

* chore: cleanup

* fix: disable quantize_moe_expert on merge per comment

* chore: add kernel info to optimizations doc
2026-03-05 09:58:09 -05:00

2.4 KiB

Finetune Z.ai's GLM-4.7-Flash with Axolotl

GLM-4.7-Flash is a 30B-A3B MoE model by Z.ai.

This guide shows how to fine-tune it with Axolotl.

Getting started

  1. Install Axolotl following the installation guide.

  2. Install Cut Cross Entropy to reduce training VRAM usage.

  3. Run the finetuning example:

# QLoRA
# - no target experts (1x48GB @ ~24GiB/GPU)
# - target experts (1x48GB @ ~34GiB/GPU)
axolotl train examples/glm47-flash/qlora.yaml

# QLoRA FSDP2 no target experts (2x48GB @ ~29GiB/GPU)
axolotl train examples/glm47-flash/qlora_fsdp.yaml
# LoRA
# - no target experts (1x48GB @ ~35GiB/GPU)
# - target experts (1x48GB @ OOM. Projected ~45-50GiB/GPU)
axolotl train examples/glm47-flash/lora.yaml

# LoRA FSDP2 no target experts (2x48GB @ ~43GiB/GPU)
axolotl train examples/glm47-flash/lora_fsdp.yaml

MoE Expert Quantization & Expert LoRA

This model quantize expert weights on load. To learn about expert quantization, expert LoRA targeting, and related limitations, see the MoE Expert Quantization docs.

Limitations

  • lora_target_linear: Incompatible for this model.
  • LoRA kernels: Incompatible with this model due to non-standard attention projections (DSA). Must be explicitly disabled (lora_*_kernel: false).

TIPS

  • For inference, the official Z.ai team recommends these default settings (most tasks):
    • temperature: 1.0
    • top_p: 0.95
    • max_new_tokens: 131072
  • You can run a full finetuning by removing adapter: qlora, load_in_4bit: true, and quantize_moe_experts: true from the config. This is heavy, so we have not tested this.
  • Read more on how to load your own dataset at docs.

Optimization Guides

Please check the Optimizations doc.