* fix: force train split for json,csv,txt for test_datasets * feat(doc): add info on mixing datasets for VLM * feat(doc): max memory * fix(doc): clarify lr groups * fix: add info on vision not being dropped * feat: add qwen3-vl to multimodal docs * fix: add moe blocks to arch list * feat(doc): improve mistral docs * chore: add helpful link [skip-e2e] * fix: add vram usage for mistral small * Update link in docs/faq.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: salman <salman.mohammadi@outlook.com>
1.6 KiB
1.6 KiB
Mistral Small 3.1/3.2 Fine-tuning
This guide covers fine-tuning Mistral Small 3.1 and Mistral Small 3.2 with vision capabilities using Axolotl.
Prerequisites
Before starting, ensure you have:
- Installed Axolotl (see Installation docs)
Getting Started
-
Install the required vision lib:
pip install 'mistral-common[opencv]==1.8.5' -
Download the example dataset image:
wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg -
Run the fine-tuning:
axolotl train examples/mistral/mistral-small/mistral-small-3.1-24B-lora.yml
This config uses about 29.4 GiB VRAM.
Dataset Format
The vision model requires multi-modal dataset format as documented here.
One exception is that, passing "image": PIL.Image is not supported. MistralTokenizer only supports path, url, and base64 for now.
Example:
{
"messages": [
{"role": "system", "content": [{ "type": "text", "text": "{SYSTEM_PROMPT}"}]},
{"role": "user", "content": [
{ "type": "text", "text": "What's in this image?"},
{"type": "image", "path": "path/to/image.jpg" }
]},
{"role": "assistant", "content": [{ "type": "text", "text": "..." }]},
],
}
Limitations
- Sample Packing is not supported for multi-modality training currently.