Files
Wing Lian 66a3de3629 build examples readmes with quarto (#3046)
* build examples readmes with quarto

* chore: formatting

* feat: dynamic build docs

* feat: add more model guides

* chore: format

* fix: collapse sidebar completely to have space for model guides

* fix: security protection for generated qmd

* fix: adjust collapse level, add new models, update links

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-12-25 19:17:25 +07:00
..

Mistral Small 3.1/3.2 Fine-tuning

This guide covers fine-tuning Mistral Small 3.1 and Mistral Small 3.2 with vision capabilities using Axolotl.

Prerequisites

Before starting, ensure you have:

Getting Started

  1. Install the required vision lib:

    pip install 'mistral-common[opencv]==1.8.5'
    
  2. Download the example dataset image:

    wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg
    
  3. Run the fine-tuning:

    axolotl train examples/mistral/mistral-small/mistral-small-3.1-24B-lora.yml
    

This config uses about 29.4 GiB VRAM.

Dataset Format

The vision model requires multi-modal dataset format as documented here.

One exception is that, passing "image": PIL.Image is not supported. MistralTokenizer only supports path, url, and base64 for now.

Example:

{
    "messages": [
        {"role": "system", "content": [{ "type": "text", "text": "{SYSTEM_PROMPT}"}]},
        {"role": "user", "content": [
            { "type": "text", "text": "What's in this image?"},
            {"type": "image", "path": "path/to/image.jpg" }
        ]},
        {"role": "assistant", "content": [{ "type": "text", "text": "..." }]},
    ],
}

Limitations

  • Sample Packing is not supported for multi-modality training currently.