Files

NanoCode012 a098df527b feat: add Mistral Small 4 (#3502 )

* feat: add mistral small 4

* fix: update mistral common

* fix: deepcopy when passing in tokenizer

* feat: add doc on reasoning and thinking section

* fix: don't use custom tokenizer and quantize experts

* chore: update docs and configs

* chore: update doc to follow official name

* feat: update cce to include mistral4

* chore: move

* fix: naming

* fix: test mock breaking get_text_config check

* fix: enable CCE and add expert block targetting to configs

* chore: docs

* fix: use act checkpointing

* chore: doc

* chore: docs

* chore: docs

2026-03-17 09:39:05 +07:00

fft-text.yml

feat: add Mistral Small 4 (#3502 )

2026-03-17 09:39:05 +07:00

fft-vision.yml

feat: add Mistral Small 4 (#3502 )

2026-03-17 09:39:05 +07:00

qlora-text.yml

feat: add Mistral Small 4 (#3502 )

2026-03-17 09:39:05 +07:00

qlora-vision.yml

feat: add Mistral Small 4 (#3502 )

2026-03-17 09:39:05 +07:00

README.md

feat: add Mistral Small 4 (#3502 )

2026-03-17 09:39:05 +07:00

README.md

Finetune Mistral Small 4 with Axolotl

Mistral Small 4 is a 119B parameter (6.5B active) multimodal MoE model from MistralAI that unifies instruct, reasoning, and coding capabilities into a single model. It is available on HuggingFace at Mistral-Small-4-119B-2603.

Thanks to the team at MistralAI for giving us early access to prepare for this release.

Getting started

Note: Training this model requires weights in BF16 which we will link to later. Users interested in training can convert / descale the existing FP8 weights.

Install Axolotl following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage
Install transformers from main

pip install git+https://github.com/huggingface/transformers.git

Run one of the example configs:

# text-only
axolotl train examples/mistral4/qlora-text.yml  # no experts ~69 GiB, experts ~93 GiB
axolotl train examples/mistral4/fft-text.yml

# text + vision
# run: wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg
axolotl train examples/mistral4/qlora-vision.yml  # no experts ~68 GiB
axolotl train examples/mistral4/fft-vision.yml

Note: FFT configs provided as reference. Please adjust hyperparameters as needed.

Reasoning Effort

The chat template supports a reasoning_effort variable to control the model's reasoning depth:

"none" — instruct mode (default)
"high" — reasoning mode with explicit thinking steps

Pass it via chat_template_kwargs under your dataset config:

datasets:
  - path: your/dataset
    type: chat_template
    chat_template_kwargs:
      reasoning_effort: high

Thinking Support

The chat template supports a thinking content type in assistant messages for training on reasoning traces (rendered as [THINK]...[/THINK] blocks).

To use thinking datasets, add the thinking mapping via message_property_mappings:

datasets:
  - path: your/thinking-dataset
    type: chat_template
    message_property_mappings:
      role: role
      content: content
      thinking: thinking
    chat_template_kwargs:
      reasoning_effort: high

See the Magistral thinking guide for dataset format details.

Tips

Read more on how to load your own dataset at docs.
The text dataset format follows the OpenAI Messages format as seen here.
The vision model requires multi-modal dataset format as documented here.

README.md

Finetune Mistral Small 4 with Axolotl

Getting started

Reasoning Effort

Thinking Support

Tips

Related Resources