Finetune GLM-4.6V with Axolotl

GLM-4.6V is a family of vision-language models from ZhipuAI found on HuggingFace. This guide shows how to fine-tune it with Axolotl for vision-language tasks.

Getting started

Install Axolotl from source following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage.

Run the fine-tuning:

glm-4-6v-flash(9B)

axolotl train examples/glm46v/glm-4-6v-flash-qlora.yaml

Let us know how it goes. Happy finetuning! 🚀

Tips

Vision datasets should follow the format described in the multimodal docs
You can run a full finetuning by removing the adapter: qlora and load_in_4bit: true from the config.
Read more on how to load your own dataset in the dataset loading docs.

Supported Models

GLM-4.6V: Full vision-language model (zai-org/GLM-4.6V)
GLM-4.6V-Flash: Faster variant (zai-org/GLM-4.6V-Flash)

Optimization Guides

Please check the Optimizations doc.

README.md

Finetune GLM-4.6V with Axolotl

Getting started

Tips

Supported Models

Optimization Guides

Related Resources