Files
axolotl/examples/mistral4
NanoCode012 a098df527b feat: add Mistral Small 4 (#3502)
* feat: add mistral small 4

* fix: update mistral common

* fix: deepcopy when passing in tokenizer

* feat: add doc on reasoning and thinking section

* fix: don't use custom tokenizer and quantize experts

* chore: update docs and configs

* chore: update doc to follow official name

* feat: update cce to include mistral4

* chore: move

* fix: naming

* fix: test mock breaking get_text_config check

* fix: enable CCE and add expert block targetting to configs

* chore: docs

* fix: use act checkpointing

* chore: doc

* chore: docs

* chore: docs
2026-03-17 09:39:05 +07:00
..
2026-03-17 09:39:05 +07:00
2026-03-17 09:39:05 +07:00

Finetune Mistral Small 4 with Axolotl

Mistral Small 4 is a 119B parameter (6.5B active) multimodal MoE model from MistralAI that unifies instruct, reasoning, and coding capabilities into a single model. It is available on HuggingFace at Mistral-Small-4-119B-2603.

Thanks to the team at MistralAI for giving us early access to prepare for this release.

Getting started

Note: Training this model requires weights in BF16 which we will link to later. Users interested in training can convert / descale the existing FP8 weights.

  1. Install Axolotl following the installation guide.

  2. Install Cut Cross Entropy to reduce training VRAM usage

  3. Install transformers from main

pip install git+https://github.com/huggingface/transformers.git
  1. Run one of the example configs:
# text-only
axolotl train examples/mistral4/qlora-text.yml  # no experts ~69 GiB, experts ~93 GiB
axolotl train examples/mistral4/fft-text.yml

# text + vision
# run: wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg
axolotl train examples/mistral4/qlora-vision.yml  # no experts ~68 GiB
axolotl train examples/mistral4/fft-vision.yml

Note: FFT configs provided as reference. Please adjust hyperparameters as needed.

Reasoning Effort

The chat template supports a reasoning_effort variable to control the model's reasoning depth:

  • "none" — instruct mode (default)
  • "high" — reasoning mode with explicit thinking steps

Pass it via chat_template_kwargs under your dataset config:

datasets:
  - path: your/dataset
    type: chat_template
    chat_template_kwargs:
      reasoning_effort: high

Thinking Support

The chat template supports a thinking content type in assistant messages for training on reasoning traces (rendered as [THINK]...[/THINK] blocks).

To use thinking datasets, add the thinking mapping via message_property_mappings:

datasets:
  - path: your/thinking-dataset
    type: chat_template
    message_property_mappings:
      role: role
      content: content
      thinking: thinking
    chat_template_kwargs:
      reasoning_effort: high

See the Magistral thinking guide for dataset format details.

Tips

  • Read more on how to load your own dataset at docs.
  • The text dataset format follows the OpenAI Messages format as seen here.
  • The vision model requires multi-modal dataset format as documented here.