Finetune Z.ai's GLM-4.7-Flash with Axolotl
GLM-4.7-Flash is a 30B-A3B MoE model.
This guide shows how to fine-tune it with Axolotl.
Getting started
-
Install Axolotl following the installation guide.
-
Install Cut Cross Entropy to reduce training VRAM usage
-
Run the finetuning example:
axolotl train examples/glm4.7-flash/glm4.7-flash-qlora.yaml
This config uses about X GiB VRAM.
Let us know how it goes. Happy finetuning! 🚀
TIPS
- For inference, the official Z.ai team recommends
top_p: 0.95,temperature: 1.0, andmax_new_tokens: 131072. - You can run a full finetuning by removing the
adapter: qloraandload_in_4bit: truefrom the config. - Read more on how to load your own dataset at docs.
Optimization Guides
Please check the Optimizations doc.