Built site for gh-pages
This commit is contained in:
@@ -346,7 +346,7 @@
|
||||
"href": "docs/multimodal.html",
|
||||
"title": "MultiModal / Vision Language Models (BETA)",
|
||||
"section": "",
|
||||
"text": "Mllama\nPixtral\nLlava-1.5\nMistral-Small-3.1\nGemma-3\nQwen2-VL\nQwen2.5-VL",
|
||||
"text": "Mllama\nLlama4\nPixtral\nLlava-1.5\nMistral-Small-3.1\nGemma-3\nQwen2-VL\nQwen2.5-VL",
|
||||
"crumbs": [
|
||||
"How To Guides",
|
||||
"MultiModal / Vision Language Models (BETA)"
|
||||
@@ -357,7 +357,7 @@
|
||||
"href": "docs/multimodal.html#supported-models",
|
||||
"title": "MultiModal / Vision Language Models (BETA)",
|
||||
"section": "",
|
||||
"text": "Mllama\nPixtral\nLlava-1.5\nMistral-Small-3.1\nGemma-3\nQwen2-VL\nQwen2.5-VL",
|
||||
"text": "Mllama\nLlama4\nPixtral\nLlava-1.5\nMistral-Small-3.1\nGemma-3\nQwen2-VL\nQwen2.5-VL",
|
||||
"crumbs": [
|
||||
"How To Guides",
|
||||
"MultiModal / Vision Language Models (BETA)"
|
||||
@@ -368,7 +368,7 @@
|
||||
"href": "docs/multimodal.html#usage",
|
||||
"title": "MultiModal / Vision Language Models (BETA)",
|
||||
"section": "Usage",
|
||||
"text": "Usage\nMultimodal support is limited and doesn’t have full feature parity.\nHere are the hyperparams you’ll need to use to finetune a multimodal model.\nprocessor_type: AutoProcessor\n\nskip_prepare_dataset: true\nremove_unused_columns: false # leave columns in place as they are needed to handle image embeddings during training\nsample_packing: false # not yet supported with multimodal\n\nchat_template: # see in next section\n\n# example dataset\ndatasets:\n - path: HuggingFaceH4/llava-instruct-mix-vsft\n type: chat_template\n split: train[:1%]\n field_messages: messages\n\n# (optional) if doing lora, only finetune the Language model,\n# leave the vision model and vision tower frozen\n# load_in_8bit: true\nadapter: lora\nlora_target_modules: 'language_model.model.layers.[\\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'\n\n# (optional) if you want to resize images to a set size\nimage_size: 512\nimage_resize_algorithm: bilinear\nPlease see examples folder for full configs.\n\n\n\n\n\n\nWarning\n\n\n\nSome of our chat_templates have been extended to support broader dataset types. This should not break any existing configs.\n\n\n\nMllama\nbase_model: meta-llama/Llama-3.2-11B-Vision-Instruct\n\nchat_template: llama3_2_vision\n\n\nPixtral\nbase_model: mistralai/Pixtral-12B-2409\n\nchat_template: pixtral\n\n\nLlava-1.5\nbase_model: llava-hf/llava-1.5-7b-hf\n\nchat_template: llava\n\n\nMistral-Small-3.1\nbase_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503\n\nchat_template: mistral_v7_tekken\n\n\nGemma-3\n\n\n\n\n\n\nTip\n\n\n\nThe Gemma3-1B model is a text-only model, so please train as regular text model.\n\n\nFor multi-modal 4B/12B/27B models, use the following config:\nbase_model: google/gemma-3-4b-it\n\nchat_template: gemma3\n\n\nQwen2-VL\nbase_model: Qwen/Qwen2-VL-7B-Instruct\n\nchat_template: qwen2_vl\n\n\nQwen2.5-VL\nbase_model: Qwen/Qwen2.5-VL-7B-Instruct\n\nchat_template: qwen2_vl # same as qwen2-vl",
|
||||
"text": "Usage\nMultimodal support is limited and doesn’t have full feature parity.\nHere are the hyperparams you’ll need to use to finetune a multimodal model.\nprocessor_type: AutoProcessor\n\nskip_prepare_dataset: true\nremove_unused_columns: false # leave columns in place as they are needed to handle image embeddings during training\nsample_packing: false # not yet supported with multimodal\n\nchat_template: # see in next section\n\n# example dataset\ndatasets:\n - path: HuggingFaceH4/llava-instruct-mix-vsft\n type: chat_template\n split: train[:1%]\n field_messages: messages\n\n# (optional) if doing lora, only finetune the Language model,\n# leave the vision model and vision tower frozen\n# load_in_8bit: true\nadapter: lora\nlora_target_modules: 'language_model.model.layers.[\\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'\n\n# (optional) if you want to resize images to a set size\nimage_size: 512\nimage_resize_algorithm: bilinear\nPlease see examples folder for full configs.\n\n\n\n\n\n\nWarning\n\n\n\nSome of our chat_templates have been extended to support broader dataset types. This should not break any existing configs.\n\n\n\nMllama\nbase_model: meta-llama/Llama-3.2-11B-Vision-Instruct\n\nchat_template: llama3_2_vision\n\n\nLlama4\nbase_model: meta-llama/Llama-4-Scout-17B-16E-Instruct\n\nchat_template: llama4\n\n\nPixtral\nbase_model: mistralai/Pixtral-12B-2409\n\nchat_template: pixtral\n\n\nLlava-1.5\nbase_model: llava-hf/llava-1.5-7b-hf\n\nchat_template: llava\n\n\nMistral-Small-3.1\nbase_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503\n\nchat_template: mistral_v7_tekken\n\n\nGemma-3\n\n\n\n\n\n\nTip\n\n\n\nThe Gemma3-1B model is a text-only model, so please train as regular text model.\n\n\nFor multi-modal 4B/12B/27B models, use the following config:\nbase_model: google/gemma-3-4b-it\n\nchat_template: gemma3\n\n\nQwen2-VL\nbase_model: Qwen/Qwen2-VL-7B-Instruct\n\nchat_template: qwen2_vl\n\n\nQwen2.5-VL\nbase_model: Qwen/Qwen2.5-VL-7B-Instruct\n\nchat_template: qwen2_vl # same as qwen2-vl",
|
||||
"crumbs": [
|
||||
"How To Guides",
|
||||
"MultiModal / Vision Language Models (BETA)"
|
||||
|
||||
Reference in New Issue
Block a user