Files

NanoCode012 ebbd7fa847 feat: Add Mistral Medium 3.5 (#3633 )

* fix: clarify incompat

* fix: transformers api change upstream

* fix: add pre prop

* feat: add examples

* chore: cleanup

* chore: update readme

2026-04-29 22:46:51 +07:00

3.0 KiB

Raw Permalink Blame History

Finetune Mistral Medium 3.5 with Axolotl

Mistral Medium 3.5 is a 128B parameter dense multimodal model from MistralAI that unifies instruct, reasoning, and agentic capabilities into a single model. It shares the mistral3 architecture (dense, YaRN RoPE, 256k context) with Ministral 3 and supports the same reasoning_effort toggle as Mistral Small 4.

Thanks to the team at MistralAI for giving us early access to prepare for this release.

Getting started

Install Axolotl following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage.
(Text config only) Install Flash Attention 4 on Hopper/Blackwell.

Run one of the example configs:

# text-only
axolotl train examples/mistral-medium-3_5/qlora-text.yml  # ~83.1 GiB

# text + vision
# wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg
axolotl train examples/mistral-medium-3_5/qlora-vision.yml  # ~80.3 GiB

Note: vision training does not currently work with Flash Attention 4.

Reasoning Effort

The chat template supports a reasoning_effort variable to control the model's reasoning depth:

"none" — instruct mode (default)
"high" — reasoning mode with explicit thinking steps

Pass it via chat_template_kwargs under your dataset config:

datasets:
  - path: your/dataset
    type: chat_template
    chat_template_kwargs:
      reasoning_effort: high

Thinking Support

The chat template supports a thinking content type in assistant messages for training on reasoning traces (rendered as [THINK]...[/THINK] blocks).

To use thinking datasets, add the thinking mapping via message_property_mappings:

datasets:
  - path: your/thinking-dataset
    type: chat_template
    message_property_mappings:
      role: role
      content: content
      thinking: thinking
    chat_template_kwargs:
      reasoning_effort: high

See the Magistral thinking guide for dataset format details.

Tips

For smaller experiments on the same architecture, see examples/ministral3 (Ministral 3, 3B).
Read more on how to load your own dataset at docs.
The text dataset format follows the OpenAI Messages format as seen here.
The vision model requires multi-modal dataset format as documented here.

3.0 KiB Raw Permalink Blame History