Files

Wing Lian 130ef7c51a Various fixes for VLMs (#3063 )

* fix to not use batch feature indexing

* more vlm fixes

* use AutoModelForImageTextToText

* add example yaml and need num2words for chat template

* improve handling of adding image tokens to conversation

* add lfm2-vl support

* update the lfm readme

* fix markdown and add rtol for loss checks

* feat: add smolvlm2 processing strat

* fix: check for causal-conv1d in lfm models

* feat: add docs for lfm2

* feat: add new models and tips to docs

* feat: add smolvlm2 docs and remove extra dep

* chore: update docs

* feat: add video instructions

* chore: cleanup

* chore: comments

* fix: typo

* feat: add usage stats

* chore: refactor

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>

2025-08-15 10:52:57 -04:00

2.0 KiB

Raw Blame History

Finetune SmolVLM2 with Axolotl

SmolVLM2 are a family of lightweight, open-source multimodal models from HuggingFace designed to analyze and understand video, image, and text content.

These models are built for efficiency, making them well-suited for on-device applications where computational resources are limited. Models are available in multiple sizes, including 2.2B, 500M, and 256M.

This guide shows how to fine-tune SmolVLM2 models with Axolotl.

Getting Started

Install Axolotl following the installation guide.

Here is an example of how to install from pip:

# Ensure you have a compatible version of Pytorch installed
pip3 install packaging setuptools wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'

Install an extra dependency:
```
pip3 install num2words==0.5.14
```

Run the finetuning example:

# LoRA SFT (1x48GB @ 6.8GiB)
axolotl train examples/smolvlm2/smolvlm2-2B-lora.yaml

TIPS

Dataset Format: For video finetuning, your dataset must be compatible with the multi-content Messages format. For more details, see our documentation on Multimodal Formats.
Dataset Loading: Read more on how to prepare and load your own datasets in our documentation.

2.0 KiB Raw Blame History

Finetune SmolVLM2 with Axolotl

Getting Started

TIPS

Optimization Guides

Related Resources

2.0 KiB

Raw Blame History