* feat: move to uv first * fix: update doc to uv first * fix: merge dev/tests into uv pyproject * fix: update docker docs to match current config * fix: migrate examples to readme * fix: add llmcompressor to conflict * feat: rec uv sync with lockfile for dev/ci * fix: update docker docs to clarify how to use uv images * chore: docs * fix: use system python, no venv * fix: set backend cpu * fix: only set for installing pytorch step * fix: remove unsloth kernel and installs * fix: remove U in tests * fix: set backend in deps too * chore: test * chore: comments * fix: attempt to lock torch * fix: workaround torch cuda and not upgraded * fix: forgot to push * fix: missed source * fix: nightly upstream loralinear config * fix: nightly phi3 long rope not work * fix: forgot commit * fix: test phi3 template change * fix: no more requirements * fix: carry over changes from new requirements to pyproject * chore: remove lockfile per discussion * fix: set match-runtime * fix: remove unneeded hf hub buildtime * fix: duplicate cache delete on nightly * fix: torchvision being overridden * fix: migrate to uv images * fix: leftover from merge * fix: simplify base readme * fix: update assertion message to be clearer * chore: docs * fix: change fallback for cicd script * fix: match against main exactly * fix: peft 0.19.1 change * fix: e2e test * fix: ci * fix: e2e test
47 lines
1.8 KiB
Markdown
47 lines
1.8 KiB
Markdown
# Finetune SmolVLM2 with Axolotl
|
|
|
|
[SmolVLM2](https://huggingface.co/collections/HuggingFaceTB/smolvlm2-smallest-video-lm-ever-67ab6b5e84bf8aaa60cb17c7) are a family of lightweight, open-source multimodal models from HuggingFace designed to analyze and understand video, image, and text content.
|
|
|
|
These models are built for efficiency, making them well-suited for on-device applications where computational resources are limited. Models are available in multiple sizes, including 2.2B, 500M, and 256M.
|
|
|
|
This guide shows how to fine-tune SmolVLM2 models with Axolotl.
|
|
|
|
## Getting Started
|
|
|
|
1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).
|
|
|
|
Here is an example of how to install from pip:
|
|
```bash
|
|
# Ensure you have a compatible version of Pytorch installed
|
|
uv pip install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
|
|
```
|
|
|
|
2. Install an extra dependency:
|
|
|
|
```bash
|
|
uv pip install num2words==0.5.14
|
|
```
|
|
|
|
3. Run the finetuning example:
|
|
|
|
```bash
|
|
# LoRA SFT (1x48GB @ 6.8GiB)
|
|
axolotl train examples/smolvlm2/smolvlm2-2B-lora.yaml
|
|
```
|
|
|
|
## TIPS
|
|
|
|
- **Dataset Format**: For video finetuning, your dataset must be compatible with the multi-content Messages format. For more details, see our documentation on [Multimodal Formats](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).
|
|
- **Dataset Loading**: Read more on how to prepare and load your own datasets in our [documentation](https://docs.axolotl.ai/docs/dataset_loading.html).
|
|
|
|
## Optimization Guides
|
|
|
|
Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html).
|
|
|
|
## Related Resources
|
|
|
|
- [SmolVLM2 Blog](https://huggingface.co/blog/smolvlm2)
|
|
- [Axolotl Docs](https://docs.axolotl.ai)
|
|
- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
|
|
- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
|