Files
axolotl/examples/mistral/mistral-small/README.md
NanoCode012 243620394a fix: force train split for json,csv,txt for test_datasets and misc doc changes (#3226)
* fix: force train split for json,csv,txt for test_datasets

* feat(doc): add info on mixing datasets for VLM

* feat(doc): max memory

* fix(doc): clarify lr groups

* fix: add info on vision not being dropped

* feat: add qwen3-vl to multimodal docs

* fix: add moe blocks to arch list

* feat(doc): improve mistral docs

* chore: add helpful link [skip-e2e]

* fix: add vram usage for mistral small

* Update link in docs/faq.qmd

Co-authored-by: salman <salman.mohammadi@outlook.com>

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: salman <salman.mohammadi@outlook.com>
2025-10-22 15:23:20 -07:00

1.6 KiB

Mistral Small 3.1/3.2 Fine-tuning

This guide covers fine-tuning Mistral Small 3.1 and Mistral Small 3.2 with vision capabilities using Axolotl.

Prerequisites

Before starting, ensure you have:

Getting Started

  1. Install the required vision lib:

    pip install 'mistral-common[opencv]==1.8.5'
    
  2. Download the example dataset image:

    wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg
    
  3. Run the fine-tuning:

    axolotl train examples/mistral/mistral-small/mistral-small-3.1-24B-lora.yml
    

This config uses about 29.4 GiB VRAM.

Dataset Format

The vision model requires multi-modal dataset format as documented here.

One exception is that, passing "image": PIL.Image is not supported. MistralTokenizer only supports path, url, and base64 for now.

Example:

{
    "messages": [
        {"role": "system", "content": [{ "type": "text", "text": "{SYSTEM_PROMPT}"}]},
        {"role": "user", "content": [
            { "type": "text", "text": "What's in this image?"},
            {"type": "image", "path": "path/to/image.jpg" }
        ]},
        {"role": "assistant", "content": [{ "type": "text", "text": "..." }]},
    ],
}

Limitations

  • Sample Packing is not supported for multi-modality training currently.