From 302e9406edaa2b86e10943e881c9d17ca3a3b1a8 Mon Sep 17 00:00:00 2001
From: Quarto GHA Workflow Runner A: There seems to be a wheel issue with FA2 2.8.0 on CUDA 12.4. Try CUDA 12.6 instead or downgrade to FA2 2.7.4. Please refer to the upstream issue: https://github.com/Dao-AILab/flash-attention/issues/1717. Q: Can we mix text and text+image datasets for VLM training? A: Yes, you can for newer VLM arch. The ones that would not work are LLaVA / Pixtral arch. If you notice one not working, please let us know! Q: Why is A: We use
+
+
+memory/max_* different from nvidia-smi?
+
torch APIs to retrieve this information. You can see https://docs.pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management for more information.Chat templates
diff --git a/docs/lr_groups.html b/docs/lr_groups.html
index 9cc9c3bbd..baf02d267 100644
--- a/docs/lr_groups.html
+++ b/docs/lr_groups.html
@@ -563,6 +563,19 @@ modules in a model.
In this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate
of 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s
self attention q_proj module.
We currently only support varying lr for now. If you’re interested in adding support for others (weight_decay), we welcome PRs. See https://github.com/axolotl-ai-cloud/axolotl/blob/613bcf90e58f3ab81d3827e7fc572319908db9fb/src/axolotl/core/trainers/mixins/optimizer.py#L17
Please see examples folder for full configs.
-Some of our chat_templates have been extended to support broader dataset types. This should not break any existing configs.
As of now, we do not truncate nor drop samples based on sequence_len as each arch has different ways to process non-text tokens. We are looking for help on this.
base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
@@ -757,6 +771,12 @@ Tip
chat_template: qwen2_vl # same as qwen2-vlbase_model: Qwen/Qwen3-VL-4B-Instruct
+
+chat_template: qwen2_vl # same as qwen2-vlPlease make sure to install num2words via pip3 install num2words==0.5.14
base_model: HuggingFaceTB/SmolVLM2-500M-Video-Instructbase_model: HuggingFaceTB/SmolVLM2-500M-Video-InstructPlease uninstall causal-conv1d via pip3 uninstall -y causal-conv1d
base_model: LiquidAI/LFM2-VL-450Mbase_model: LiquidAI/LFM2-VL-450MHere is an example of a multi-modal dataset:
-[
- {
- "messages": [
- {
- "role": "system",
- "content": [
- {"type": "text", "text": "You are a helpful assistant."}
- ]
- },
- {
- "role": "user",
- "content": [
- {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
- {"type": "text", "text": "Describe this image in detail."}
- ]
- },
- {
- "role": "assistant",
- "content": [
- {"type": "text", "text": "The image is a bee."}
- ]
- }
- ]
- }
-][
+ {
+ "messages": [
+ {
+ "role": "system",
+ "content": [
+ {"type": "text", "text": "You are a helpful assistant."}
+ ]
+ },
+ {
+ "role": "user",
+ "content": [
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
+ {"type": "text", "text": "Describe this image in detail."}
+ ]
+ },
+ {
+ "role": "assistant",
+ "content": [
+ {"type": "text", "text": "The image is a bee."}
+ ]
+ }
+ ]
+ }
+]