Files

NanoCode012 006f226270 Feat: add Olmo3 (BC with Olmo and Olmo2) (#3275 )

* feat: update cce to include olmo family

* chore: update docs following feedback

* feat: add olmo3 config

* fix: clarify 3 methods

* chore: add olmo to readme

2025-11-24 10:21:31 +07:00

README.md

Feat: add Olmo3 (BC with Olmo and Olmo2) (#3275 )

2025-11-24 10:21:31 +07:00

smolvlm2-2B-lora.yaml

Various fixes for VLMs (#3063 )

2025-08-15 10:52:57 -04:00

README.md

Finetune SmolVLM2 with Axolotl

SmolVLM2 are a family of lightweight, open-source multimodal models from HuggingFace designed to analyze and understand video, image, and text content.

These models are built for efficiency, making them well-suited for on-device applications where computational resources are limited. Models are available in multiple sizes, including 2.2B, 500M, and 256M.

This guide shows how to fine-tune SmolVLM2 models with Axolotl.

Getting Started

Install Axolotl following the installation guide.

Here is an example of how to install from pip:

# Ensure you have a compatible version of Pytorch installed
pip3 install packaging setuptools wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'

Install an extra dependency:
```
pip3 install num2words==0.5.14
```

Run the finetuning example:

# LoRA SFT (1x48GB @ 6.8GiB)
axolotl train examples/smolvlm2/smolvlm2-2B-lora.yaml

TIPS

Dataset Format: For video finetuning, your dataset must be compatible with the multi-content Messages format. For more details, see our documentation on Multimodal Formats.
Dataset Loading: Read more on how to prepare and load your own datasets in our documentation.

Optimization Guides

Please check the Optimizations doc.

README.md

Finetune SmolVLM2 with Axolotl

Getting Started

TIPS

Optimization Guides

Related Resources