Files

NanoCode012 9de5b76336 feat: move to uv first (#3545 )

* feat: move to uv first

* fix: update doc to uv first

* fix: merge dev/tests into uv pyproject

* fix: update docker docs to match current config

* fix: migrate examples to readme

* fix: add llmcompressor to conflict

* feat: rec uv sync with lockfile for dev/ci

* fix: update docker docs to clarify how to use uv images

* chore: docs

* fix: use system python, no venv

* fix: set backend cpu

* fix: only set for installing pytorch step

* fix: remove unsloth kernel and installs

* fix: remove U in tests

* fix: set backend in deps too

* chore: test

* chore: comments

* fix: attempt to lock torch

* fix: workaround torch cuda and not upgraded

* fix: forgot to push

* fix: missed source

* fix: nightly upstream loralinear config

* fix: nightly phi3 long rope not work

* fix: forgot commit

* fix: test phi3 template change

* fix: no more requirements

* fix: carry over changes from new requirements to pyproject

* chore: remove lockfile per discussion

* fix: set match-runtime

* fix: remove unneeded hf hub buildtime

* fix: duplicate cache delete on nightly

* fix: torchvision being overridden

* fix: migrate to uv images

* fix: leftover from merge

* fix: simplify base readme

* fix: update assertion message to be clearer

* chore: docs

* fix: change fallback for cicd script

* fix: match against main exactly

* fix: peft 0.19.1 change

* fix: e2e test

* fix: ci

* fix: e2e test

2026-04-21 10:16:03 -04:00

2.9 KiB

Raw Blame History

Finetune Mistral Small 4 with Axolotl

Mistral Small 4 is a 119B parameter (6.5B active) multimodal MoE model from MistralAI that unifies instruct, reasoning, and coding capabilities into a single model. It is available on HuggingFace at Mistral-Small-4-119B-2603.

Thanks to the team at MistralAI for giving us early access to prepare for this release.

Getting started

Install Axolotl following the installation guide.
Install Cut Cross Entropy to reduce training VRAM usage
Install transformers from main

uv pip install git+https://github.com/huggingface/transformers.git

Run one of the example configs:

# text-only
axolotl train examples/mistral4/qlora-text.yml  # no experts ~69 GiB, experts ~93 GiB
axolotl train examples/mistral4/fft-text.yml

# text + vision
# run: wget https://huggingface.co/datasets/Nanobit/text-vision-2k-test/resolve/main/African_elephant.jpg
axolotl train examples/mistral4/qlora-vision.yml  # no experts ~68 GiB
axolotl train examples/mistral4/fft-vision.yml

Note: FFT configs provided as reference. Please adjust hyperparameters as needed.

Reasoning Effort

The chat template supports a reasoning_effort variable to control the model's reasoning depth:

"none" — instruct mode (default)
"high" — reasoning mode with explicit thinking steps

Pass it via chat_template_kwargs under your dataset config:

datasets:
  - path: your/dataset
    type: chat_template
    chat_template_kwargs:
      reasoning_effort: high

Thinking Support

The chat template supports a thinking content type in assistant messages for training on reasoning traces (rendered as [THINK]...[/THINK] blocks).

To use thinking datasets, add the thinking mapping via message_property_mappings:

datasets:
  - path: your/thinking-dataset
    type: chat_template
    message_property_mappings:
      role: role
      content: content
      thinking: thinking
    chat_template_kwargs:
      reasoning_effort: high

See the Magistral thinking guide for dataset format details.

Tips

Read more on how to load your own dataset at docs.
The text dataset format follows the OpenAI Messages format as seen here.
The vision model requires multi-modal dataset format as documented here.

2.9 KiB Raw Blame History