Files

NanoCode012 4f5e8a328a Feat: add MiMo and Plano (#3332 ) [skip-ci]

* feat: add xiaomi's mimo 7b

* fix: pin revision

* fix: update trinity docs and pin revision

* fix: wrong config name

* feat: add vram usage

* feat: add plano

* feat: update plano vram usage

* chore: comments

2025-12-25 18:09:03 +07:00

mimo-7b-qlora.yaml

Feat: add MiMo and Plano (#3332 ) [skip-ci]

2025-12-25 18:09:03 +07:00

README.md

Feat: add MiMo and Plano (#3332 ) [skip-ci]

2025-12-25 18:09:03 +07:00

README.md

Finetune Xiaomi's MiMo with Axolotl

MiMo is a family of models trained from scratch for reasoning tasks, incorporating Multiple-Token Prediction (MTP) as an additional training objective for enhanced performance and faster inference. Pre-trained on ~25T tokens with a three-stage data mixture strategy and optimized reasoning pattern density.

This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

Getting started

Install Axolotl following the installation guide.

Run the finetuning example:

axolotl train examples/mimo/mimo-7b-qlora.yaml

This config uses about 17.2 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀

Tips

You can run a full finetuning by removing the adapter: qlora and load_in_4bit: true from the config.
Read more on how to load your own dataset at docs.
The dataset format follows the OpenAI Messages format as seen here.

Optimization Guides

Please check the Optimizations doc.

Limitations

Cut Cross Entropy (CCE): Currently not supported. We plan to include CCE support for MiMo in the near future.

README.md

Finetune Xiaomi's MiMo with Axolotl

Getting started

Tips

Optimization Guides

Limitations

Related Resources