Mistral Small 4 is a 119B parameter (6.5B active) multimodal MoE model from MistralAI that unifies instruct, reasoning, and coding capabilities into a single model. It is available on HuggingFace at Mistral-Small-4-119B-2603.

Thanks to the team at MistralAI for giving us early access to prepare for this release.

Getting started

Install Axolotl following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage
Install transformers from main

pip install git+https://github.com/huggingface/transformers.git

Run one of the example configs:

# text-only
axolotl train examples/mistral4/qlora-text.yml  # no experts ~69 GiB, experts ~93 GiB
axolotl train examples/mistral4/fft-text.yml

# text + vision
# run: wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg
axolotl train examples/mistral4/qlora-vision.yml  # no experts ~68 GiB
axolotl train examples/mistral4/fft-vision.yml

Note: FFT configs provided as reference. Please adjust hyperparameters as needed.

Reasoning Effort

The chat template supports a reasoning_effort variable to control the model's reasoning depth:

"none" — instruct mode (default)
"high" — reasoning mode with explicit thinking steps

Pass it via chat_template_kwargs under your dataset config:

datasets:
  - path: your/dataset
    type: chat_template
    chat_template_kwargs:
      reasoning_effort: high

Thinking Support

The chat template supports a thinking content type in assistant messages for training on reasoning traces (rendered as [THINK]...[/THINK] blocks).

To use thinking datasets, add the thinking mapping via message_property_mappings:

datasets:
  - path: your/thinking-dataset
    type: chat_template
    message_property_mappings:
      role: role
      content: content
      thinking: thinking
    chat_template_kwargs:
      reasoning_effort: high

See the Magistral thinking guide for dataset format details.

Tips

Read more on how to load your own dataset at docs.
The text dataset format follows the OpenAI Messages format as seen here.
The vision model requires multi-modal dataset format as documented here.

README.md

Finetune Mistral Small 4 with Axolotl

Getting started

Reasoning Effort

Thinking Support

Tips

Related Resources