feat: add doc for expert quantization, glm45 air example configs, and update readme for release (#3452) [skip ci]

* chore: rename without period * feat: add glm45 air * feat: add doc on expert quantization * feat: update base readme with new changes * chore: cleanup * chore: cleanup * chore: cleanup * fix: disable quantize_moe_expert on merge per comment * chore: add kernel info to optimizations doc
2026-03-05 21:58:09 +07:00
parent b6b8db805a
commit 753906cfc7
13 changed files with 248 additions and 29 deletions
--- a/examples/glm45/README.md
+++ b/examples/glm45/README.md
@@ -0,0 +1,72 @@
+# Finetune Z.ai's GLM-4.5-Air with Axolotl
+
+[GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) is a MoE model by Z.ai.
+
+This guide shows how to fine-tune it with Axolotl.
+
+## Getting started
+
+1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).
+
+2. Install [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy) to reduce training VRAM usage.
+
+3. Run the finetuning example:
+
+```bash
+# QLoRA (1x80GB @ ~63.4GiB/GPU)
+axolotl train examples/glm45/glm-45-air-qlora.yaml
+```
+
+### Dataset
+
+In addition to the standard OpenAI Messages format, GLM-4.5 supports an extra parameter for thinking in the assistant section.
+
+```json
+{
+    "role": "assistant",
+    "reasoning_content": "...",  // or have </think>...</think> in `content`
+    "content": "..."
+}
+```
+
+Make sure you set the below extra attributes if needed:
+
+```yaml
+datasets:
+  - path: ...
+    type: chat_template
+    message_property_mappings:
+      role: role
+      content: content
+
+    #   tool_calls: tool_calls  # uncomment if using tools
+    #   reasoning_content: reasoning_content  # uncomment if have reasoning
+
+# Uncomment if training on tool role (you would rarely if ever need this)
+# eot_tokens:
+#   - <|observation|>
+```
+
+### Tips
+
+- The role name for tools in this template is `tool`.
+- You will see this Axolotl WARNING — this is expected as the template does not use EOS:
+  ```
+  EOS token '<|endoftext|>' not found in chat_template. Please check if your template/EOS token is correct.
+  ```
+- You can run a full finetuning by removing `adapter: qlora`, `load_in_4bit: true`, and `quantize_moe_experts: true` from the config.
+- **LoRA kernels**: Incompatible with this model. Must be explicitly disabled (`lora_*_kernel: false`).
+- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
+
+## Optimization Guides
+
+Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html).
+
+## Related Resources
+
+- [GLM-4.5-Air on HuggingFace](https://huggingface.co/zai-org/GLM-4.5-Air)
+- [GLM-4.5 Blog](https://z.ai/blog/glm-4.5)
+- [Axolotl Docs](https://docs.axolotl.ai)
+- [Axolotl Website](https://axolotl.ai)
+- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
+- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)