Files
axolotl/examples/glm46v

Finetune GLM-4.6V with Axolotl

GLM-4.6V is a family of vision-language models from ZhipuAI found on HuggingFace. This guide shows how to fine-tune it with Axolotl for vision-language tasks.

Getting started

  1. Install Axolotl from source following the installation guide.

  2. Install Cut Cross Entropy to reduce training VRAM usage.

  3. Run the fine-tuning:

    glm-4-6v-flash(9B)

    axolotl train examples/glm46v/glm-4-6v-flash-qlora.yaml
    

Let us know how it goes. Happy finetuning! 🚀

Tips

  • Vision datasets should follow the format described in the multimodal docs
  • You can run a full finetuning by removing the adapter: qlora and load_in_4bit: true from the config.
  • Read more on how to load your own dataset in the dataset loading docs.

Supported Models

  • GLM-4.6V: Full vision-language model (zai-org/GLM-4.6V)
  • GLM-4.6V-Flash: Faster variant (zai-org/GLM-4.6V-Flash)

Optimization Guides

Please check the Optimizations doc.