* fix: clarify incompat * fix: transformers api change upstream * fix: add pre prop * feat: add examples * chore: cleanup * chore: update readme
3.0 KiB
Finetune Mistral Medium 3.5 with Axolotl
Mistral Medium 3.5 is a 128B parameter dense multimodal model from MistralAI that unifies instruct, reasoning, and agentic capabilities into a single model.
It shares the mistral3 architecture (dense, YaRN RoPE, 256k context) with Ministral 3 and supports the same reasoning_effort toggle as Mistral Small 4.
Thanks to the team at MistralAI for giving us early access to prepare for this release.
Getting started
-
Install Axolotl following the installation guide.
-
Install Cut Cross Entropy to reduce training VRAM usage.
-
(Text config only) Install Flash Attention 4 on Hopper/Blackwell.
-
Run one of the example configs:
# text-only axolotl train examples/mistral-medium-3_5/qlora-text.yml # ~83.1 GiB # text + vision # wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg axolotl train examples/mistral-medium-3_5/qlora-vision.yml # ~80.3 GiB
Note: vision training does not currently work with Flash Attention 4.
Reasoning Effort
The chat template supports a reasoning_effort variable to control the model's reasoning depth:
"none"— instruct mode (default)"high"— reasoning mode with explicit thinking steps
Pass it via chat_template_kwargs under your dataset config:
datasets:
- path: your/dataset
type: chat_template
chat_template_kwargs:
reasoning_effort: high
Thinking Support
The chat template supports a thinking content type in assistant messages for training on reasoning traces (rendered as [THINK]...[/THINK] blocks).
To use thinking datasets, add the thinking mapping via message_property_mappings:
datasets:
- path: your/thinking-dataset
type: chat_template
message_property_mappings:
role: role
content: content
thinking: thinking
chat_template_kwargs:
reasoning_effort: high
See the Magistral thinking guide for dataset format details.
Tips
- For smaller experiments on the same architecture, see
examples/ministral3(Ministral 3, 3B). - Read more on how to load your own dataset at docs.
- The text dataset format follows the OpenAI Messages format as seen here.
- The vision model requires multi-modal dataset format as documented here.