Built site for gh-pages
This commit is contained in:
@@ -788,12 +788,19 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
<ul>
|
<ul>
|
||||||
<li><a href="#model-architectures-agent-reference" id="toc-model-architectures-agent-reference" class="nav-link active" data-scroll-target="#model-architectures-agent-reference">Model Architectures — Agent Reference</a>
|
<li><a href="#model-architectures-agent-reference" id="toc-model-architectures-agent-reference" class="nav-link active" data-scroll-target="#model-architectures-agent-reference">Model Architectures — Agent Reference</a>
|
||||||
<ul class="collapse">
|
<ul class="collapse">
|
||||||
|
<li><a href="#vlm-vision-language-model-quick-start" id="toc-vlm-vision-language-model-quick-start" class="nav-link" data-scroll-target="#vlm-vision-language-model-quick-start">VLM (Vision Language Model) Quick Start</a></li>
|
||||||
|
<li><a href="#plugins-optimizations" id="toc-plugins-optimizations" class="nav-link" data-scroll-target="#plugins-optimizations">Plugins & Optimizations</a>
|
||||||
|
<ul class="collapse">
|
||||||
|
<li><a href="#cut-cross-entropy-cce" id="toc-cut-cross-entropy-cce" class="nav-link" data-scroll-target="#cut-cross-entropy-cce">Cut Cross Entropy (CCE)</a></li>
|
||||||
|
<li><a href="#scattermoe-kernels" id="toc-scattermoe-kernels" class="nav-link" data-scroll-target="#scattermoe-kernels">ScatterMoE Kernels</a></li>
|
||||||
|
</ul></li>
|
||||||
<li><a href="#gemma-4" id="toc-gemma-4" class="nav-link" data-scroll-target="#gemma-4">Gemma 4</a>
|
<li><a href="#gemma-4" id="toc-gemma-4" class="nav-link" data-scroll-target="#gemma-4">Gemma 4</a>
|
||||||
<ul class="collapse">
|
<ul class="collapse">
|
||||||
<li><a href="#required-settings" id="toc-required-settings" class="nav-link" data-scroll-target="#required-settings">Required settings</a></li>
|
<li><a href="#required-settings" id="toc-required-settings" class="nav-link" data-scroll-target="#required-settings">Required settings</a></li>
|
||||||
<li><a href="#auto-detection" id="toc-auto-detection" class="nav-link" data-scroll-target="#auto-detection">Auto-detection</a></li>
|
<li><a href="#auto-detection" id="toc-auto-detection" class="nav-link" data-scroll-target="#auto-detection">Auto-detection</a></li>
|
||||||
<li><a href="#multi-gpu" id="toc-multi-gpu" class="nav-link" data-scroll-target="#multi-gpu">Multi-GPU</a></li>
|
<li><a href="#multi-gpu" id="toc-multi-gpu" class="nav-link" data-scroll-target="#multi-gpu">Multi-GPU</a></li>
|
||||||
<li><a href="#moe-26b-a4b" id="toc-moe-26b-a4b" class="nav-link" data-scroll-target="#moe-26b-a4b">MoE (26B-A4B)</a></li>
|
<li><a href="#moe-26b-a4b" id="toc-moe-26b-a4b" class="nav-link" data-scroll-target="#moe-26b-a4b">MoE (26B-A4B)</a></li>
|
||||||
|
<li><a href="#vlm-vision-training" id="toc-vlm-vision-training" class="nav-link" data-scroll-target="#vlm-vision-training">VLM (Vision) Training</a></li>
|
||||||
<li><a href="#common-issues" id="toc-common-issues" class="nav-link" data-scroll-target="#common-issues">Common issues</a></li>
|
<li><a href="#common-issues" id="toc-common-issues" class="nav-link" data-scroll-target="#common-issues">Common issues</a></li>
|
||||||
<li><a href="#e2be4b-dense-models" id="toc-e2be4b-dense-models" class="nav-link" data-scroll-target="#e2be4b-dense-models">E2B/E4B dense models</a></li>
|
<li><a href="#e2be4b-dense-models" id="toc-e2be4b-dense-models" class="nav-link" data-scroll-target="#e2be4b-dense-models">E2B/E4B dense models</a></li>
|
||||||
</ul></li>
|
</ul></li>
|
||||||
@@ -813,19 +820,63 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
<section id="model-architectures-agent-reference" class="level1">
|
<section id="model-architectures-agent-reference" class="level1">
|
||||||
<h1>Model Architectures — Agent Reference</h1>
|
<h1>Model Architectures — Agent Reference</h1>
|
||||||
<p>Model-specific quirks, required settings, and known issues. Check this before debugging training failures on specific model families.</p>
|
<p>Model-specific quirks, required settings, and known issues. Check this before debugging training failures on specific model families.</p>
|
||||||
|
<section id="vlm-vision-language-model-quick-start" class="level2">
|
||||||
|
<h2 class="anchored" data-anchor-id="vlm-vision-language-model-quick-start">VLM (Vision Language Model) Quick Start</h2>
|
||||||
|
<p>All VLM configs require these four lines:</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">processor_type</span><span class="kw">:</span><span class="at"> AutoProcessor</span></span>
|
||||||
|
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="fu">skip_prepare_dataset</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="fu">remove_unused_columns</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||||
|
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="fu">sample_packing</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
<p>Decision tree for VLM config:</p>
|
||||||
|
<pre class="text"><code>Is the model multimodal (has vision/audio encoder)?
|
||||||
|
├─ YES: Add `freeze_mm_modules: true` if training text only
|
||||||
|
│ Add `chat_template: <model_template>` (e.g. gemma4, qwen3_5, gemma3)
|
||||||
|
│ LoRA: use regex `lora_target_modules` to restrict to language model
|
||||||
|
└─ NO: Train as a regular text model
|
||||||
|
|
||||||
|
Is the model MoE (e.g. Gemma4 26B-A4B, Qwen3.5 35B-A3B)?
|
||||||
|
├─ YES: Add `lora_target_parameters` for expert LoRA
|
||||||
|
│ Consider ScatterMoE kernels (see Plugins section)
|
||||||
|
└─ NO: Standard LoRA config</code></pre>
|
||||||
|
</section>
|
||||||
|
<section id="plugins-optimizations" class="level2">
|
||||||
|
<h2 class="anchored" data-anchor-id="plugins-optimizations">Plugins & Optimizations</h2>
|
||||||
|
<section id="cut-cross-entropy-cce" class="level3">
|
||||||
|
<h3 class="anchored" data-anchor-id="cut-cross-entropy-cce">Cut Cross Entropy (CCE)</h3>
|
||||||
|
<p>Computes loss from hidden states + lm_head weight without materializing the full logits tensor, saving significant VRAM. Install if not already present:</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> pip install <span class="st">"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@main"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
||||||
|
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
</section>
|
||||||
|
<section id="scattermoe-kernels" class="level3">
|
||||||
|
<h3 class="anchored" data-anchor-id="scattermoe-kernels">ScatterMoE Kernels</h3>
|
||||||
|
<p>Fuses expert + LoRA computation into a single kernel for MoE models. Significant speedup for models with many experts.</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
||||||
|
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.kernels.KernelsPlugin</span></span>
|
||||||
|
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use_kernels</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use_scattermoe</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="fu">experts_implementation</span><span class="kw">:</span><span class="at"> scattermoe</span></span>
|
||||||
|
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Expert LoRA targets (3D parameter tensors, not nn.Linear):</span></span>
|
||||||
|
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_parameters</span><span class="kw">:</span></span>
|
||||||
|
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.gate_up_proj</span></span>
|
||||||
|
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.down_proj</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
<p>Supported: Gemma4 (<code>gemma4_text</code>), Mixtral, Qwen MoE variants. The plugin auto-detects model type and routing function. Without ScatterMoE, expert LoRA still works but runs base expert matmul and LoRA as separate operations.</p>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
<section id="gemma-4" class="level2">
|
<section id="gemma-4" class="level2">
|
||||||
<h2 class="anchored" data-anchor-id="gemma-4">Gemma 4</h2>
|
<h2 class="anchored" data-anchor-id="gemma-4">Gemma 4</h2>
|
||||||
<p><strong>Models</strong>: <code>google/gemma-4-26B-A4B</code> (MoE), <code>google/gemma-4-31B</code> (dense), <code>google/gemma-4-E2B</code>, <code>google/gemma-4-E4B</code></p>
|
<p><strong>Models</strong>: <code>google/gemma-4-26B-A4B</code> (MoE), <code>google/gemma-4-31B</code> (dense), <code>google/gemma-4-E2B</code>, <code>google/gemma-4-E4B</code></p>
|
||||||
<p><strong>Architecture</strong>: Multimodal wrapper (<code>Gemma4ForConditionalGeneration</code>) over a text backbone (<code>Gemma4TextModel</code>), with optional vision/audio encoders. All Gemma4 HF repos have <code>model_type: "gemma4"</code> — even text-only variants load as multimodal with a vision tower.</p>
|
<p><strong>Architecture</strong>: Multimodal wrapper (<code>Gemma4ForConditionalGeneration</code>) over a text backbone (<code>Gemma4TextModel</code>), with optional vision/audio encoders. All Gemma4 HF repos have <code>model_type: "gemma4"</code> — even text-only variants load as multimodal with a vision tower.</p>
|
||||||
<section id="required-settings" class="level3">
|
<section id="required-settings" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="required-settings">Required settings</h3>
|
<h3 class="anchored" data-anchor-id="required-settings">Required settings</h3>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Always needed for Gemma4:</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Always needed for Gemma4:</span></span>
|
||||||
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="fu">freeze_mm_modules</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Freeze vision/audio encoders for text-only training</span></span>
|
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="fu">freeze_mm_modules</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Freeze vision/audio encoders for text-only training</span></span>
|
||||||
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_checkpointing_kwargs</span><span class="kw">:</span></span>
|
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="fu">gradient_checkpointing_kwargs</span><span class="kw">:</span></span>
|
||||||
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_reentrant</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span><span class="co"> # Shared per-layer norms cause "marked ready twice" with reentrant</span></span>
|
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">use_reentrant</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span><span class="co"> # Shared per-layer norms cause "marked ready twice" with reentrant</span></span>
|
||||||
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="co"># LoRA target — restrict to language model only (DO NOT use lora_target_linear: true):</span></span>
|
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a><span class="co"># LoRA target — restrict to language model only (DO NOT use lora_target_linear: true):</span></span>
|
||||||
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_modules</span><span class="kw">:</span><span class="at"> </span><span class="st">'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_modules</span><span class="kw">:</span><span class="at"> </span><span class="st">'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="auto-detection" class="level3">
|
<section id="auto-detection" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="auto-detection">Auto-detection</h3>
|
<h3 class="anchored" data-anchor-id="auto-detection">Auto-detection</h3>
|
||||||
@@ -877,13 +928,13 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
</tbody>
|
</tbody>
|
||||||
</table>
|
</table>
|
||||||
<p>FSDP2 config:</p>
|
<p>FSDP2 config:</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
||||||
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> full_shard</span></span>
|
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> full_shard</span></span>
|
||||||
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> auto_wrap</span></span>
|
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> auto_wrap</span></span>
|
||||||
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||||
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_version</span><span class="kw">:</span><span class="at"> </span><span class="dv">2</span></span>
|
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_version</span><span class="kw">:</span><span class="at"> </span><span class="dv">2</span></span>
|
||||||
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_auto_wrap_policy</span><span class="kw">:</span><span class="at"> TRANSFORMER_BASED_WRAP</span></span>
|
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_auto_wrap_policy</span><span class="kw">:</span><span class="at"> TRANSFORMER_BASED_WRAP</span></span>
|
||||||
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> Gemma4TextDecoderLayer</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> Gemma4TextDecoderLayer</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="moe-26b-a4b" class="level3">
|
<section id="moe-26b-a4b" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="moe-26b-a4b">MoE (26B-A4B)</h3>
|
<h3 class="anchored" data-anchor-id="moe-26b-a4b">MoE (26B-A4B)</h3>
|
||||||
@@ -891,17 +942,40 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
<li><p><code>enable_moe_block: true</code>, 256 experts, top-k routing</p></li>
|
<li><p><code>enable_moe_block: true</code>, 256 experts, top-k routing</p></li>
|
||||||
<li><p>No separate <code>SparseMoeBlock</code> — MoE is embedded in each decoder layer</p></li>
|
<li><p>No separate <code>SparseMoeBlock</code> — MoE is embedded in each decoder layer</p></li>
|
||||||
<li><p>Expert LoRA targets 3D parameter tensors:</p>
|
<li><p>Expert LoRA targets 3D parameter tensors:</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_parameters</span><span class="kw">:</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_parameters</span><span class="kw">:</span></span>
|
||||||
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.gate_up_proj</span></span>
|
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.gate_up_proj</span></span>
|
||||||
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.down_proj</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
|
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.down_proj</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
|
||||||
<li><p>ScatterMoE kernel acceleration:</p>
|
<li><p>ScatterMoE kernel acceleration:</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
||||||
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.kernels.KernelsPlugin</span></span>
|
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.kernels.KernelsPlugin</span></span>
|
||||||
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use_kernels</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="fu">use_kernels</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use_scattermoe</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use_scattermoe</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="fu">experts_implementation</span><span class="kw">:</span><span class="at"> scattermoe</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
|
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="fu">experts_implementation</span><span class="kw">:</span><span class="at"> scattermoe</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
|
||||||
</ul>
|
</ul>
|
||||||
</section>
|
</section>
|
||||||
|
<section id="vlm-vision-training" class="level3">
|
||||||
|
<h3 class="anchored" data-anchor-id="vlm-vision-training">VLM (Vision) Training</h3>
|
||||||
|
<p>All Gemma4 models load as <code>Gemma4ForConditionalGeneration</code> with a vision tower. No custom <code>ProcessingStrategy</code> needed — the base class auto-detects the image token.</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> google/gemma-4-E2B-it</span><span class="co"> # or E4B-it, 26B-A4B</span></span>
|
||||||
|
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="fu">processor_type</span><span class="kw">:</span><span class="at"> AutoProcessor</span></span>
|
||||||
|
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="fu">freeze_mm_modules</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> gemma4</span></span>
|
||||||
|
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a><span class="fu">skip_prepare_dataset</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="fu">remove_unused_columns</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||||
|
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a><span class="fu">sample_packing</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
<p>A starting VLM loss of ~8-15 is typical. In most runs, loss converges below 1.0 within ~30-50 steps, though results may vary across configurations.</p>
|
||||||
|
<p>For the 26B-A4B MoE variant with ScatterMoE + expert LoRA + CCE, add:</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
||||||
|
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin</span></span>
|
||||||
|
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.kernels.KernelsPlugin</span></span>
|
||||||
|
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="fu">use_kernels</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="fu">use_scattermoe</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a><span class="fu">experts_implementation</span><span class="kw">:</span><span class="at"> scattermoe</span></span>
|
||||||
|
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_parameters</span><span class="kw">:</span></span>
|
||||||
|
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.gate_up_proj</span></span>
|
||||||
|
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.down_proj</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
</section>
|
||||||
<section id="common-issues" class="level3">
|
<section id="common-issues" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="common-issues">Common issues</h3>
|
<h3 class="anchored" data-anchor-id="common-issues">Common issues</h3>
|
||||||
<table class="caption-top table">
|
<table class="caption-top table">
|
||||||
@@ -969,9 +1043,9 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
<li><p>256 experts, 8 active per token</p></li>
|
<li><p>256 experts, 8 active per token</p></li>
|
||||||
<li><p>Known weight scale drift in late DeltaNet layers (36-38) due to AdamW + rare expert interaction</p></li>
|
<li><p>Known weight scale drift in late DeltaNet layers (36-38) due to AdamW + rare expert interaction</p></li>
|
||||||
<li><p>Fix: <code>normalize_weight_scales</code> config to detect and rescale outliers:</p>
|
<li><p>Fix: <code>normalize_weight_scales</code> config to detect and rescale outliers:</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">normalize_weight_scales</span><span class="kw">:</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="fu">normalize_weight_scales</span><span class="kw">:</span></span>
|
||||||
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">name_pattern</span><span class="kw">:</span><span class="at"> </span><span class="st">'linear_attn\.conv1d\.weight'</span></span>
|
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="fu">name_pattern</span><span class="kw">:</span><span class="at"> </span><span class="st">'linear_attn\.conv1d\.weight'</span></span>
|
||||||
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">threshold</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.3</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
|
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">threshold</span><span class="kw">:</span><span class="at"> </span><span class="fl">1.3</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div></li>
|
||||||
</ul>
|
</ul>
|
||||||
</section>
|
</section>
|
||||||
<section id="general-moe-notes" class="level2">
|
<section id="general-moe-notes" class="level2">
|
||||||
|
|||||||
@@ -23,6 +23,41 @@ ul.task-list li input[type="checkbox"] {
|
|||||||
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
|
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
|
||||||
vertical-align: middle;
|
vertical-align: middle;
|
||||||
}
|
}
|
||||||
|
/* CSS for syntax highlighting */
|
||||||
|
html { -webkit-text-size-adjust: 100%; }
|
||||||
|
pre > code.sourceCode { white-space: pre; position: relative; }
|
||||||
|
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
|
||||||
|
pre > code.sourceCode > span:empty { height: 1.2em; }
|
||||||
|
.sourceCode { overflow: visible; }
|
||||||
|
code.sourceCode > span { color: inherit; text-decoration: inherit; }
|
||||||
|
div.sourceCode { margin: 1em 0; }
|
||||||
|
pre.sourceCode { margin: 0; }
|
||||||
|
@media screen {
|
||||||
|
div.sourceCode { overflow: auto; }
|
||||||
|
}
|
||||||
|
@media print {
|
||||||
|
pre > code.sourceCode { white-space: pre-wrap; }
|
||||||
|
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
|
||||||
|
}
|
||||||
|
pre.numberSource code
|
||||||
|
{ counter-reset: source-line 0; }
|
||||||
|
pre.numberSource code > span
|
||||||
|
{ position: relative; left: -4em; counter-increment: source-line; }
|
||||||
|
pre.numberSource code > span > a:first-child::before
|
||||||
|
{ content: counter(source-line);
|
||||||
|
position: relative; left: -1em; text-align: right; vertical-align: baseline;
|
||||||
|
border: none; display: inline-block;
|
||||||
|
-webkit-touch-callout: none; -webkit-user-select: none;
|
||||||
|
-khtml-user-select: none; -moz-user-select: none;
|
||||||
|
-ms-user-select: none; user-select: none;
|
||||||
|
padding: 0 4px; width: 4em;
|
||||||
|
}
|
||||||
|
pre.numberSource { margin-left: 3em; padding-left: 4px; }
|
||||||
|
div.sourceCode
|
||||||
|
{ }
|
||||||
|
@media screen {
|
||||||
|
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
|
||||||
|
}
|
||||||
</style>
|
</style>
|
||||||
|
|
||||||
|
|
||||||
@@ -760,6 +795,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
<li><a href="#hyperparameter-ranges" id="toc-hyperparameter-ranges" class="nav-link" data-scroll-target="#hyperparameter-ranges">Hyperparameter Ranges</a></li>
|
<li><a href="#hyperparameter-ranges" id="toc-hyperparameter-ranges" class="nav-link" data-scroll-target="#hyperparameter-ranges">Hyperparameter Ranges</a></li>
|
||||||
<li><a href="#healthy-training-indicators" id="toc-healthy-training-indicators" class="nav-link" data-scroll-target="#healthy-training-indicators">Healthy Training Indicators</a></li>
|
<li><a href="#healthy-training-indicators" id="toc-healthy-training-indicators" class="nav-link" data-scroll-target="#healthy-training-indicators">Healthy Training Indicators</a></li>
|
||||||
<li><a href="#known-issues" id="toc-known-issues" class="nav-link" data-scroll-target="#known-issues">Known Issues</a></li>
|
<li><a href="#known-issues" id="toc-known-issues" class="nav-link" data-scroll-target="#known-issues">Known Issues</a></li>
|
||||||
|
<li><a href="#profiling" id="toc-profiling" class="nav-link" data-scroll-target="#profiling">Profiling</a></li>
|
||||||
<li><a href="#file-map" id="toc-file-map" class="nav-link" data-scroll-target="#file-map">File Map</a></li>
|
<li><a href="#file-map" id="toc-file-map" class="nav-link" data-scroll-target="#file-map">File Map</a></li>
|
||||||
</ul></li>
|
</ul></li>
|
||||||
</ul>
|
</ul>
|
||||||
@@ -1009,6 +1045,22 @@ Multi-GPU: FSDP or DeepSpeed shards model across GPUs automatically.</code></pre
|
|||||||
</tr>
|
</tr>
|
||||||
</tbody>
|
</tbody>
|
||||||
</table>
|
</table>
|
||||||
|
</section>
|
||||||
|
<section id="profiling" class="level2">
|
||||||
|
<h2 class="anchored" data-anchor-id="profiling">Profiling</h2>
|
||||||
|
<p>To profile training and identify optimization opportunities:</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Profile steps 3-7 (after warmup/autotuning settles)</span></span>
|
||||||
|
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">profiler_steps_start</span><span class="kw">:</span><span class="at"> </span><span class="dv">3</span></span>
|
||||||
|
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">profiler_steps</span><span class="kw">:</span><span class="at"> </span><span class="dv">5</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
<p>This produces <code>profiler_trace.json</code> (Chrome trace) and <code>snapshot.pickle</code> (memory snapshot) in <code>output_dir</code>.
|
||||||
|
View the Chrome trace at <code>chrome://tracing</code>.</p>
|
||||||
|
<p>To programmatically inspect the trace:</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ex">python</span> scripts/analyze_profile.py output_dir/</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
<p>The trace shows per-kernel CUDA times, memory allocations, and operator-level breakdown. Look for:
|
||||||
|
- <strong>Large matmul kernels</strong>: candidates for fusion or quantization
|
||||||
|
- <strong>Memory copies (H2D/D2H)</strong>: unnecessary data movement
|
||||||
|
- <strong>Small frequent kernels</strong>: candidates for kernel fusion
|
||||||
|
- <strong>Gaps between kernels</strong>: pipeline bubbles from CPU overhead</p>
|
||||||
<p>Full troubleshooting: <a href="../../docs/training_stability.html">training_stability.qmd</a>, <a href="../../docs/debugging.html">debugging.qmd</a></p>
|
<p>Full troubleshooting: <a href="../../docs/training_stability.html">training_stability.qmd</a>, <a href="../../docs/debugging.html">debugging.qmd</a></p>
|
||||||
</section>
|
</section>
|
||||||
<section id="file-map" class="level2">
|
<section id="file-map" class="level2">
|
||||||
|
|||||||
@@ -797,6 +797,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
<li><a href="#sec-mistral-small-4" id="toc-sec-mistral-small-4" class="nav-link" data-scroll-target="#sec-mistral-small-4">Mistral-Small-4</a></li>
|
<li><a href="#sec-mistral-small-4" id="toc-sec-mistral-small-4" class="nav-link" data-scroll-target="#sec-mistral-small-4">Mistral-Small-4</a></li>
|
||||||
<li><a href="#sec-magistral-small-2509" id="toc-sec-magistral-small-2509" class="nav-link" data-scroll-target="#sec-magistral-small-2509">Magistral-Small-2509</a></li>
|
<li><a href="#sec-magistral-small-2509" id="toc-sec-magistral-small-2509" class="nav-link" data-scroll-target="#sec-magistral-small-2509">Magistral-Small-2509</a></li>
|
||||||
<li><a href="#sec-voxtral" id="toc-sec-voxtral" class="nav-link" data-scroll-target="#sec-voxtral">Voxtral</a></li>
|
<li><a href="#sec-voxtral" id="toc-sec-voxtral" class="nav-link" data-scroll-target="#sec-voxtral">Voxtral</a></li>
|
||||||
|
<li><a href="#sec-gemma-4" id="toc-sec-gemma-4" class="nav-link" data-scroll-target="#sec-gemma-4">Gemma-4</a></li>
|
||||||
<li><a href="#sec-gemma-3" id="toc-sec-gemma-3" class="nav-link" data-scroll-target="#sec-gemma-3">Gemma-3</a></li>
|
<li><a href="#sec-gemma-3" id="toc-sec-gemma-3" class="nav-link" data-scroll-target="#sec-gemma-3">Gemma-3</a></li>
|
||||||
<li><a href="#sec-gemma-3n" id="toc-sec-gemma-3n" class="nav-link" data-scroll-target="#sec-gemma-3n">Gemma-3n</a></li>
|
<li><a href="#sec-gemma-3n" id="toc-sec-gemma-3n" class="nav-link" data-scroll-target="#sec-gemma-3n">Gemma-3n</a></li>
|
||||||
<li><a href="#sec-qwen2-vl" id="toc-sec-qwen2-vl" class="nav-link" data-scroll-target="#sec-qwen2-vl">Qwen2-VL</a></li>
|
<li><a href="#sec-qwen2-vl" id="toc-sec-qwen2-vl" class="nav-link" data-scroll-target="#sec-qwen2-vl">Qwen2-VL</a></li>
|
||||||
@@ -844,6 +845,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
|||||||
<section id="supported-models" class="level2">
|
<section id="supported-models" class="level2">
|
||||||
<h2 class="anchored" data-anchor-id="supported-models">Supported Models</h2>
|
<h2 class="anchored" data-anchor-id="supported-models">Supported Models</h2>
|
||||||
<ul>
|
<ul>
|
||||||
|
<li><a href="#sec-gemma-4">Gemma-4</a> <em>(NEW)</em></li>
|
||||||
<li><a href="#sec-mllama">Mllama</a></li>
|
<li><a href="#sec-mllama">Mllama</a></li>
|
||||||
<li><a href="#sec-llama4">Llama4</a></li>
|
<li><a href="#sec-llama4">Llama4</a></li>
|
||||||
<li><a href="#sec-pixtral">Pixtral</a></li>
|
<li><a href="#sec-pixtral">Pixtral</a></li>
|
||||||
@@ -998,6 +1000,55 @@ Tip
|
|||||||
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="fu">processor_type</span><span class="kw">:</span><span class="at"> VoxtralProcessor</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="fu">processor_type</span><span class="kw">:</span><span class="at"> VoxtralProcessor</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
|
<section id="sec-gemma-4" class="level3">
|
||||||
|
<h3 class="anchored" data-anchor-id="sec-gemma-4">Gemma-4</h3>
|
||||||
|
<p>All Gemma 4 variants (E2B, E4B, 26B-A4B, 31B) load as multimodal models even for text-only training.</p>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> google/gemma-4-E2B-it</span><span class="co"> # or E4B-it, 26B-A4B, 31B</span></span>
|
||||||
|
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> gemma4</span></span>
|
||||||
|
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="fu">freeze_mm_modules</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # freeze vision/audio encoders for text-only or vision LoRA</span></span>
|
||||||
|
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a><span class="co"># For the 26B-A4B MoE model, enable ScatterMoE and expert LoRA:</span></span>
|
||||||
|
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="fu">plugins</span><span class="kw">:</span></span>
|
||||||
|
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin</span></span>
|
||||||
|
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> axolotl.integrations.kernels.KernelsPlugin</span></span>
|
||||||
|
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a><span class="fu">use_kernels</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a><span class="fu">use_scattermoe</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||||
|
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a><span class="fu">experts_implementation</span><span class="kw">:</span><span class="at"> scattermoe</span></span>
|
||||||
|
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_modules</span><span class="kw">:</span><span class="at"> </span><span class="st">'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp|self_attn).(up|down|gate|q|k|v|o)_proj'</span></span>
|
||||||
|
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a><span class="co"># MoE expert LoRA (3D tensors, not nn.Linear) — only for 26B-A4B:</span></span>
|
||||||
|
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a><span class="fu">lora_target_parameters</span><span class="kw">:</span></span>
|
||||||
|
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.gate_up_proj</span></span>
|
||||||
|
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> experts.down_proj</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
<div class="callout callout-style-default callout-warning callout-titled">
|
||||||
|
<div class="callout-header d-flex align-content-center">
|
||||||
|
<div class="callout-icon-container">
|
||||||
|
<i class="callout-icon"></i>
|
||||||
|
</div>
|
||||||
|
<div class="callout-title-container flex-fill">
|
||||||
|
Warning
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="callout-body-container callout-body">
|
||||||
|
<p>Gemma 4 VLM training starts with high loss (~8-15). This is expected — see the <a href="../docs/training_stability.html">training stability guide</a> for details.</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="callout callout-style-default callout-tip callout-titled">
|
||||||
|
<div class="callout-header d-flex align-content-center">
|
||||||
|
<div class="callout-icon-container">
|
||||||
|
<i class="callout-icon"></i>
|
||||||
|
</div>
|
||||||
|
<div class="callout-title-container flex-fill">
|
||||||
|
Tip
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="callout-body-container callout-body">
|
||||||
|
<p>For DDP training, axolotl auto-detects Gemma4 and sets <code>use_reentrant=False</code> and <code>ddp_find_unused_parameters=True</code>. However, when <code>activation_offloading: true</code>, <code>ddp_find_unused_parameters</code> is skipped (checkpoint wrappers conflict with it); use <code>freeze_mm_modules: true</code> instead to handle unused vision/audio params. For FSDP2, use <code>fsdp_transformer_layer_cls_to_wrap: Gemma4TextDecoderLayer</code>.</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
<section id="sec-gemma-3" class="level3">
|
<section id="sec-gemma-3" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-gemma-3">Gemma-3</h3>
|
<h3 class="anchored" data-anchor-id="sec-gemma-3">Gemma-3</h3>
|
||||||
<div class="callout callout-style-default callout-tip callout-titled">
|
<div class="callout callout-style-default callout-tip callout-titled">
|
||||||
@@ -1014,9 +1065,9 @@ Tip
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<p>For multi-modal 4B/12B/27B models, use the following config:</p>
|
<p>For multi-modal 4B/12B/27B models, use the following config:</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> google/gemma-3-4b-it</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> google/gemma-3-4b-it</span></span>
|
||||||
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> gemma3</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> gemma3</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="sec-gemma-3n" class="level3">
|
<section id="sec-gemma-3n" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-gemma-3n">Gemma-3n</h3>
|
<h3 class="anchored" data-anchor-id="sec-gemma-3n">Gemma-3n</h3>
|
||||||
@@ -1046,42 +1097,42 @@ Tip
|
|||||||
<p>Please make sure to install <code>timm</code> via <code>pip3 install timm==1.0.17</code></p>
|
<p>Please make sure to install <code>timm</code> via <code>pip3 install timm==1.0.17</code></p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> google/gemma-3n-E2B-it</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> google/gemma-3n-E2B-it</span></span>
|
||||||
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> gemma3n</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> gemma3n</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="sec-qwen2-vl" class="level3">
|
<section id="sec-qwen2-vl" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-qwen2-vl">Qwen2-VL</h3>
|
<h3 class="anchored" data-anchor-id="sec-qwen2-vl">Qwen2-VL</h3>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen2-VL-7B-Instruct</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen2-VL-7B-Instruct</span></span>
|
||||||
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen2_vl</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen2_vl</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="sec-qwen25-vl" class="level3">
|
<section id="sec-qwen25-vl" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-qwen25-vl">Qwen2.5-VL</h3>
|
<h3 class="anchored" data-anchor-id="sec-qwen25-vl">Qwen2.5-VL</h3>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen2.5-VL-7B-Instruct</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen2.5-VL-7B-Instruct</span></span>
|
||||||
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span>
|
|
||||||
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen2_vl</span><span class="co"> # same as qwen2-vl</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
|
||||||
</section>
|
|
||||||
<section id="sec-qwen3-vl" class="level3">
|
|
||||||
<h3 class="anchored" data-anchor-id="sec-qwen3-vl">Qwen3-VL</h3>
|
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3-VL-4B-Instruct</span></span>
|
|
||||||
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen2_vl</span><span class="co"> # same as qwen2-vl</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen2_vl</span><span class="co"> # same as qwen2-vl</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
|
<section id="sec-qwen3-vl" class="level3">
|
||||||
|
<h3 class="anchored" data-anchor-id="sec-qwen3-vl">Qwen3-VL</h3>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3-VL-4B-Instruct</span></span>
|
||||||
|
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen2_vl</span><span class="co"> # same as qwen2-vl</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
</section>
|
||||||
<section id="sec-qwen3-5" class="level3">
|
<section id="sec-qwen3-5" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-qwen3-5">Qwen3.5</h3>
|
<h3 class="anchored" data-anchor-id="sec-qwen3-5">Qwen3.5</h3>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb15"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3.5-9B</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> Qwen/Qwen3.5-9B</span></span>
|
||||||
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen3_5</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a><span class="fu">chat_template</span><span class="kw">:</span><span class="at"> qwen3_5</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="sec-glm-4-6v" class="level3">
|
<section id="sec-glm-4-6v" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-glm-4-6v">GLM-4.6V</h3>
|
<h3 class="anchored" data-anchor-id="sec-glm-4-6v">GLM-4.6V</h3>
|
||||||
<p>Both GLM-4.6V (106B MoE) and GLM-4.6V-Flash (9B) are supported.</p>
|
<p>Both GLM-4.6V (106B MoE) and GLM-4.6V-Flash (9B) are supported.</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb16"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="co"># GLM-4.6V (106B MoE version)</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="co"># GLM-4.6V (106B MoE version)</span></span>
|
||||||
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> zai-org/GLM-4.6V</span></span>
|
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> zai-org/GLM-4.6V</span></span>
|
||||||
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="co"># OR GLM-4.6V-Flash (9B version)</span></span>
|
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a><span class="co"># OR GLM-4.6V-Flash (9B version)</span></span>
|
||||||
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> zai-org/GLM-4.6V-Flash</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> zai-org/GLM-4.6V-Flash</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="sec-smolvlm2" class="level3">
|
<section id="sec-smolvlm2" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-smolvlm2">SmolVLM2</h3>
|
<h3 class="anchored" data-anchor-id="sec-smolvlm2">SmolVLM2</h3>
|
||||||
@@ -1098,7 +1149,7 @@ Tip
|
|||||||
<p>Please make sure to install <code>num2words</code> via <code>pip3 install num2words==0.5.14</code></p>
|
<p>Please make sure to install <code>num2words</code> via <code>pip3 install num2words==0.5.14</code></p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb17"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> HuggingFaceTB/SmolVLM2-500M-Video-Instruct</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> HuggingFaceTB/SmolVLM2-500M-Video-Instruct</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="sec-lfm2-vl" class="level3">
|
<section id="sec-lfm2-vl" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-lfm2-vl">LFM2-VL</h3>
|
<h3 class="anchored" data-anchor-id="sec-lfm2-vl">LFM2-VL</h3>
|
||||||
@@ -1115,7 +1166,7 @@ Warning
|
|||||||
<p>Please uninstall <code>causal-conv1d</code> via <code>pip3 uninstall -y causal-conv1d</code></p>
|
<p>Please uninstall <code>causal-conv1d</code> via <code>pip3 uninstall -y causal-conv1d</code></p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb18"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> LiquidAI/LFM2-VL-450M</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> LiquidAI/LFM2-VL-450M</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="sec-intern-vl" class="level3">
|
<section id="sec-intern-vl" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="sec-intern-vl">Intern-VL</h3>
|
<h3 class="anchored" data-anchor-id="sec-intern-vl">Intern-VL</h3>
|
||||||
@@ -1132,7 +1183,7 @@ Tip
|
|||||||
<p>Please make sure to install <code>timm</code> via <code>pip3 install timm==1.0.19</code></p>
|
<p>Please make sure to install <code>timm</code> via <code>pip3 install timm==1.0.19</code></p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb19"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> OpenGVLab/InternVL3_5-8B</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="fu">base_model</span><span class="kw">:</span><span class="at"> OpenGVLab/InternVL3_5-8B</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
<section id="dataset-format" class="level2">
|
<section id="dataset-format" class="level2">
|
||||||
@@ -1217,31 +1268,31 @@ Warning
|
|||||||
<section id="example" class="level3">
|
<section id="example" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="example">Example</h3>
|
<h3 class="anchored" data-anchor-id="example">Example</h3>
|
||||||
<p>Here is an example of a multi-modal dataset:</p>
|
<p>Here is an example of a multi-modal dataset:</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb20"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">[</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb21"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">[</span></span>
|
||||||
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
||||||
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"messages"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"messages"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
||||||
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
||||||
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"role"</span><span class="fu">:</span> <span class="st">"system"</span><span class="fu">,</span></span>
|
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"role"</span><span class="fu">:</span> <span class="st">"system"</span><span class="fu">,</span></span>
|
||||||
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"content"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"content"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
||||||
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"text"</span><span class="fu">,</span> <span class="dt">"text"</span><span class="fu">:</span> <span class="st">"You are a helpful assistant."</span><span class="fu">}</span></span>
|
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"text"</span><span class="fu">,</span> <span class="dt">"text"</span><span class="fu">:</span> <span class="st">"You are a helpful assistant."</span><span class="fu">}</span></span>
|
||||||
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
||||||
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span><span class="ot">,</span></span>
|
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span><span class="ot">,</span></span>
|
||||||
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
||||||
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">"role"</span><span class="fu">:</span> <span class="st">"user"</span><span class="fu">,</span></span>
|
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">"role"</span><span class="fu">:</span> <span class="st">"user"</span><span class="fu">,</span></span>
|
||||||
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">"content"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">"content"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
||||||
<span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"image"</span><span class="fu">,</span> <span class="dt">"url"</span><span class="fu">:</span> <span class="st">"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"</span><span class="fu">}</span><span class="ot">,</span></span>
|
<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"image"</span><span class="fu">,</span> <span class="dt">"url"</span><span class="fu">:</span> <span class="st">"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"</span><span class="fu">}</span><span class="ot">,</span></span>
|
||||||
<span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"text"</span><span class="fu">,</span> <span class="dt">"text"</span><span class="fu">:</span> <span class="st">"Describe this image in detail."</span><span class="fu">}</span></span>
|
<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"text"</span><span class="fu">,</span> <span class="dt">"text"</span><span class="fu">:</span> <span class="st">"Describe this image in detail."</span><span class="fu">}</span></span>
|
||||||
<span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
<span id="cb21-15"><a href="#cb21-15" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
||||||
<span id="cb20-16"><a href="#cb20-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span><span class="ot">,</span></span>
|
<span id="cb21-16"><a href="#cb21-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span><span class="ot">,</span></span>
|
||||||
<span id="cb20-17"><a href="#cb20-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
<span id="cb21-17"><a href="#cb21-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
|
||||||
<span id="cb20-18"><a href="#cb20-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">"role"</span><span class="fu">:</span> <span class="st">"assistant"</span><span class="fu">,</span></span>
|
<span id="cb21-18"><a href="#cb21-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">"role"</span><span class="fu">:</span> <span class="st">"assistant"</span><span class="fu">,</span></span>
|
||||||
<span id="cb20-19"><a href="#cb20-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">"content"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
<span id="cb21-19"><a href="#cb21-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">"content"</span><span class="fu">:</span> <span class="ot">[</span></span>
|
||||||
<span id="cb20-20"><a href="#cb20-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"text"</span><span class="fu">,</span> <span class="dt">"text"</span><span class="fu">:</span> <span class="st">"The image is a bee."</span><span class="fu">}</span></span>
|
<span id="cb21-20"><a href="#cb21-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span><span class="dt">"type"</span><span class="fu">:</span> <span class="st">"text"</span><span class="fu">,</span> <span class="dt">"text"</span><span class="fu">:</span> <span class="st">"The image is a bee."</span><span class="fu">}</span></span>
|
||||||
<span id="cb20-21"><a href="#cb20-21" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
<span id="cb21-21"><a href="#cb21-21" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
||||||
<span id="cb20-22"><a href="#cb20-22" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
|
<span id="cb21-22"><a href="#cb21-22" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
|
||||||
<span id="cb20-23"><a href="#cb20-23" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
<span id="cb21-23"><a href="#cb21-23" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
|
||||||
<span id="cb20-24"><a href="#cb20-24" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
|
<span id="cb21-24"><a href="#cb21-24" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
|
||||||
<span id="cb20-25"><a href="#cb20-25" aria-hidden="true" tabindex="-1"></a><span class="ot">]</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb21-25"><a href="#cb21-25" aria-hidden="true" tabindex="-1"></a><span class="ot">]</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
<section id="faq" class="level2">
|
<section id="faq" class="level2">
|
||||||
|
|||||||
100
index.html
100
index.html
@@ -906,7 +906,7 @@ Expand older updates
|
|||||||
<p><strong>Requirements</strong>:</p>
|
<p><strong>Requirements</strong>:</p>
|
||||||
<ul>
|
<ul>
|
||||||
<li>NVIDIA GPU (Ampere or newer for <code>bf16</code> and Flash Attention) or AMD GPU</li>
|
<li>NVIDIA GPU (Ampere or newer for <code>bf16</code> and Flash Attention) or AMD GPU</li>
|
||||||
<li>Python 3.11</li>
|
<li>Python >=3.11 (3.12 recommended)</li>
|
||||||
<li>PyTorch ≥2.9.1</li>
|
<li>PyTorch ≥2.9.1</li>
|
||||||
</ul>
|
</ul>
|
||||||
<section id="google-colab" class="level3">
|
<section id="google-colab" class="level3">
|
||||||
@@ -920,19 +920,45 @@ Expand older updates
|
|||||||
</section>
|
</section>
|
||||||
<section id="installation" class="level3">
|
<section id="installation" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="installation">Installation</h3>
|
<h3 class="anchored" data-anchor-id="installation">Installation</h3>
|
||||||
|
<section id="using-uv-recommended" class="level4">
|
||||||
|
<h4 class="anchored" data-anchor-id="using-uv-recommended">Using uv (recommended)</h4>
|
||||||
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># install uv if you don't already have it installed</span></span>
|
||||||
|
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ex">curl</span> <span class="at">-LsSf</span> https://astral.sh/uv/install.sh <span class="kw">|</span> <span class="fu">sh</span></span>
|
||||||
|
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="bu">source</span> <span class="va">$HOME</span>/.local/bin/env</span>
|
||||||
|
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="co"># CUDA 12.8.1 tends to have better package compatibility</span></span>
|
||||||
|
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="bu">export</span> <span class="va">UV_TORCH_BACKEND</span><span class="op">=</span>cu128</span>
|
||||||
|
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="co"># create a new virtual environment</span></span>
|
||||||
|
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> venv <span class="at">--python</span> 3.12</span>
|
||||||
|
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="bu">source</span> .venv/bin/activate</span>
|
||||||
|
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> pip install torch==2.10.0 torchvision</span>
|
||||||
|
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> pip install <span class="at">--no-build-isolation</span> axolotl<span class="pp">[</span><span class="ss">deepspeed</span><span class="pp">]</span></span>
|
||||||
|
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="co"># recommended - install cut-cross-entropy</span></span>
|
||||||
|
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> pip install <span class="st">"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@main"</span></span>
|
||||||
|
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a><span class="co"># (optional) - prefetch flash-attn2 and causal-conv1d kernels</span></span>
|
||||||
|
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a><span class="ex">uv</span> run <span class="at">--python</span> 3.12 python <span class="at">-c</span> <span class="st">"from kernels import get_kernel; get_kernel('kernels-community/flash-attn2'); get_kernel('kernels-community/causal-conv1d')"</span></span>
|
||||||
|
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
|
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Download example axolotl configs, deepspeed configs</span></span>
|
||||||
|
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch examples</span>
|
||||||
|
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch deepspeed_configs <span class="co"># OPTIONAL</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
|
</section>
|
||||||
<section id="using-pip" class="level4">
|
<section id="using-pip" class="level4">
|
||||||
<h4 class="anchored" data-anchor-id="using-pip">Using pip</h4>
|
<h4 class="anchored" data-anchor-id="using-pip">Using pip</h4>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip3</span> install <span class="at">-U</span> packaging==26.0 setuptools==75.8.0 wheel ninja</span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip3</span> install <span class="at">-U</span> packaging==26.0 setuptools==75.8.0 wheel ninja</span>
|
||||||
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ex">pip3</span> install <span class="at">--no-build-isolation</span> axolotl<span class="pp">[</span><span class="ss">flash</span><span class="pp">-</span><span class="ss">attn,deepspeed</span><span class="pp">]</span></span>
|
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="ex">pip3</span> install <span class="at">--no-build-isolation</span> axolotl<span class="pp">[</span><span class="ss">flash</span><span class="pp">-</span><span class="ss">attn,deepspeed</span><span class="pp">]</span></span>
|
||||||
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Download example axolotl configs, deepspeed configs</span></span>
|
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Download example axolotl configs, deepspeed configs</span></span>
|
||||||
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch examples</span>
|
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch examples</span>
|
||||||
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch deepspeed_configs <span class="co"># OPTIONAL</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch deepspeed_configs <span class="co"># OPTIONAL</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="using-docker" class="level4">
|
<section id="using-docker" class="level4">
|
||||||
<h4 class="anchored" data-anchor-id="using-docker">Using Docker</h4>
|
<h4 class="anchored" data-anchor-id="using-docker">Using Docker</h4>
|
||||||
<p>Installing with Docker can be less error prone than installing in your own environment.</p>
|
<p>Installing with Docker can be less error prone than installing in your own environment.</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">docker</span> run <span class="at">--gpus</span> <span class="st">'"all"'</span> <span class="at">--rm</span> <span class="at">-it</span> axolotlai/axolotl:main-latest</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ex">docker</span> run <span class="at">--gpus</span> <span class="st">'"all"'</span> <span class="at">--rm</span> <span class="at">-it</span> axolotlai/axolotl:main-latest</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
<p>Other installation approaches are described <a href="https://docs.axolotl.ai/docs/installation.html">here</a>.</p>
|
<p>Other installation approaches are described <a href="https://docs.axolotl.ai/docs/installation.html">here</a>.</p>
|
||||||
</section>
|
</section>
|
||||||
<section id="cloud-providers" class="level4">
|
<section id="cloud-providers" class="level4">
|
||||||
@@ -952,14 +978,14 @@ Expand older updates
|
|||||||
</section>
|
</section>
|
||||||
<section id="your-first-fine-tune" class="level3">
|
<section id="your-first-fine-tune" class="level3">
|
||||||
<h3 class="anchored" data-anchor-id="your-first-fine-tune">Your First Fine-tune</h3>
|
<h3 class="anchored" data-anchor-id="your-first-fine-tune">Your First Fine-tune</h3>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Fetch axolotl examples</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Fetch axolotl examples</span></span>
|
||||||
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch examples</span>
|
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch examples</span>
|
||||||
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Or, specify a custom path</span></span>
|
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Or, specify a custom path</span></span>
|
||||||
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch examples <span class="at">--dest</span> path/to/folder</span>
|
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch examples <span class="at">--dest</span> path/to/folder</span>
|
||||||
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Train a model using LoRA</span></span>
|
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Train a model using LoRA</span></span>
|
||||||
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train examples/llama-3/lora-1b.yml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train examples/llama-3/lora-1b.yml</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
<p>That’s it! Check out our <a href="https://docs.axolotl.ai/docs/getting-started.html">Getting Started Guide</a> for a more detailed walkthrough.</p>
|
<p>That’s it! Check out our <a href="https://docs.axolotl.ai/docs/getting-started.html">Getting Started Guide</a> for a more detailed walkthrough.</p>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
@@ -980,20 +1006,20 @@ Expand older updates
|
|||||||
<section id="ai-agent-support" class="level2">
|
<section id="ai-agent-support" class="level2">
|
||||||
<h2 class="anchored" data-anchor-id="ai-agent-support">AI Agent Support</h2>
|
<h2 class="anchored" data-anchor-id="ai-agent-support">AI Agent Support</h2>
|
||||||
<p>Axolotl ships with built-in documentation optimized for AI coding agents (Claude Code, Cursor, Copilot, etc.). These docs are bundled with the pip package — no repo clone needed.</p>
|
<p>Axolotl ships with built-in documentation optimized for AI coding agents (Claude Code, Cursor, Copilot, etc.). These docs are bundled with the pip package — no repo clone needed.</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Show overview and available training methods</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Show overview and available training methods</span></span>
|
||||||
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs</span>
|
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs</span>
|
||||||
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Topic-specific references</span></span>
|
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Topic-specific references</span></span>
|
||||||
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs sft <span class="co"># supervised fine-tuning</span></span>
|
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs sft <span class="co"># supervised fine-tuning</span></span>
|
||||||
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs grpo <span class="co"># GRPO online RL</span></span>
|
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs grpo <span class="co"># GRPO online RL</span></span>
|
||||||
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs preference_tuning <span class="co"># DPO, KTO, ORPO, SimPO</span></span>
|
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs preference_tuning <span class="co"># DPO, KTO, ORPO, SimPO</span></span>
|
||||||
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs reward_modelling <span class="co"># outcome and process reward models</span></span>
|
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs reward_modelling <span class="co"># outcome and process reward models</span></span>
|
||||||
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs pretraining <span class="co"># continual pretraining</span></span>
|
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs pretraining <span class="co"># continual pretraining</span></span>
|
||||||
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs <span class="at">--list</span> <span class="co"># list all topics</span></span>
|
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs <span class="at">--list</span> <span class="co"># list all topics</span></span>
|
||||||
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a></span>
|
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span>
|
||||||
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Dump config schema for programmatic use</span></span>
|
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Dump config schema for programmatic use</span></span>
|
||||||
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> config-schema</span>
|
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> config-schema</span>
|
||||||
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> config-schema <span class="at">--field</span> adapter</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> config-schema <span class="at">--field</span> adapter</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
<p>If you’re working with the source repo, agent docs are also available at <code>docs/agents/</code> and the project overview is in <code>AGENTS.md</code>.</p>
|
<p>If you’re working with the source repo, agent docs are also available at <code>docs/agents/</code> and the project overview is in <code>AGENTS.md</code>.</p>
|
||||||
</section>
|
</section>
|
||||||
<section id="getting-help" class="level2">
|
<section id="getting-help" class="level2">
|
||||||
@@ -1023,13 +1049,13 @@ disable it, set AXOLOTL_DO_NOT_TRACK=1. For more details, see our <a href="https
|
|||||||
<section id="citing-axolotl" class="level2">
|
<section id="citing-axolotl" class="level2">
|
||||||
<h2 class="anchored" data-anchor-id="citing-axolotl">📝 Citing Axolotl</h2>
|
<h2 class="anchored" data-anchor-id="citing-axolotl">📝 Citing Axolotl</h2>
|
||||||
<p>If you use Axolotl in your research or projects, please cite it as follows:</p>
|
<p>If you use Axolotl in your research or projects, please cite it as follows:</p>
|
||||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode bibtex code-with-copy"><code class="sourceCode bibtex"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co">@software{axolotl,</span></span>
|
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode bibtex code-with-copy"><code class="sourceCode bibtex"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co">@software{axolotl,</span></span>
|
||||||
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="co"> title = {Axolotl: Open Source LLM Post-Training},</span></span>
|
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="co"> title = {Axolotl: Open Source LLM Post-Training},</span></span>
|
||||||
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"> author = {{Axolotl maintainers and contributors}},</span></span>
|
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="co"> author = {{Axolotl maintainers and contributors}},</span></span>
|
||||||
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="co"> url = {https://github.com/axolotl-ai-cloud/axolotl},</span></span>
|
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="co"> url = {https://github.com/axolotl-ai-cloud/axolotl},</span></span>
|
||||||
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="co"> license = {Apache-2.0},</span></span>
|
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a><span class="co"> license = {Apache-2.0},</span></span>
|
||||||
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="co"> year = {2023}</span></span>
|
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a><span class="co"> year = {2023}</span></span>
|
||||||
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="co">}</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="co">}</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||||
</section>
|
</section>
|
||||||
<section id="license" class="level2">
|
<section id="license" class="level2">
|
||||||
<h2 class="anchored" data-anchor-id="license">📜 License</h2>
|
<h2 class="anchored" data-anchor-id="license">📜 License</h2>
|
||||||
|
|||||||
37
search.json
37
search.json
File diff suppressed because one or more lines are too long
494
sitemap.xml
494
sitemap.xml
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user