* fix to not use batch feature indexing * more vlm fixes * use AutoModelForImageTextToText * add example yaml and need num2words for chat template * improve handling of adding image tokens to conversation * add lfm2-vl support * update the lfm readme * fix markdown and add rtol for loss checks * feat: add smolvlm2 processing strat * fix: check for causal-conv1d in lfm models * feat: add docs for lfm2 * feat: add new models and tips to docs * feat: add smolvlm2 docs and remove extra dep * chore: update docs * feat: add video instructions * chore: cleanup * chore: comments * fix: typo * feat: add usage stats * chore: refactor --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>
50 lines
2.0 KiB
Markdown
50 lines
2.0 KiB
Markdown
# Finetune SmolVLM2 with Axolotl
|
|
|
|
[SmolVLM2](https://huggingface.co/collections/HuggingFaceTB/smolvlm2-smallest-video-lm-ever-67ab6b5e84bf8aaa60cb17c7) are a family of lightweight, open-source multimodal models from HuggingFace designed to analyze and understand video, image, and text content.
|
|
|
|
These models are built for efficiency, making them well-suited for on-device applications where computational resources are limited. Models are available in multiple sizes, including 2.2B, 500M, and 256M.
|
|
|
|
This guide shows how to fine-tune SmolVLM2 models with Axolotl.
|
|
|
|
## Getting Started
|
|
|
|
1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).
|
|
|
|
Here is an example of how to install from pip:
|
|
```bash
|
|
# Ensure you have a compatible version of Pytorch installed
|
|
pip3 install packaging setuptools wheel ninja
|
|
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
|
|
```
|
|
|
|
2. Install an extra dependency:
|
|
|
|
```bash
|
|
pip3 install num2words==0.5.14
|
|
```
|
|
|
|
3. Run the finetuning example:
|
|
|
|
```bash
|
|
# LoRA SFT (1x48GB @ 6.8GiB)
|
|
axolotl train examples/smolvlm2/smolvlm2-2B-lora.yaml
|
|
```
|
|
|
|
## TIPS
|
|
|
|
- **Dataset Format**: For video finetuning, your dataset must be compatible with the multi-content Messages format. For more details, see our documentation on [Multimodal Formats](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
|
|
- **Dataset Loading**: Read more on how to prepare and load your own datasets in our [documentation](https://docs.axolotl.ai/docs/dataset_loading.html).
|
|
|
|
## Optimization Guides
|
|
|
|
- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
|
|
- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)
|
|
- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
|
|
|
|
## Related Resources
|
|
|
|
- [SmolVLM2 Blog](https://huggingface.co/blog/smolvlm2)
|
|
- [Axolotl Docs](https://docs.axolotl.ai)
|
|
- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
|
|
- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
|