diff --git a/.nojekyll b/.nojekyll
index f3c74bd25..7cd04cfc1 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-605b84d9
\ No newline at end of file
+77cdfda6
\ No newline at end of file
diff --git a/docs/api/cli.main.html b/docs/api/cli.main.html
index edd9603a4..d49172e74 100644
--- a/docs/api/cli.main.html
+++ b/docs/api/cli.main.html
@@ -790,7 +790,9 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
   <ul class="collapse">
   <li><a href="#functions" id="toc-functions" class="nav-link" data-scroll-target="#functions">Functions</a>
   <ul class="collapse">
+  <li><a href="#axolotl.cli.main.agent_docs" id="toc-axolotl.cli.main.agent_docs" class="nav-link" data-scroll-target="#axolotl.cli.main.agent_docs">agent_docs</a></li>
   <li><a href="#axolotl.cli.main.cli" id="toc-axolotl.cli.main.cli" class="nav-link" data-scroll-target="#axolotl.cli.main.cli">cli</a></li>
+  <li><a href="#axolotl.cli.main.config_schema" id="toc-axolotl.cli.main.config_schema" class="nav-link" data-scroll-target="#axolotl.cli.main.config_schema">config_schema</a></li>
   <li><a href="#axolotl.cli.main.evaluate" id="toc-axolotl.cli.main.evaluate" class="nav-link" data-scroll-target="#axolotl.cli.main.evaluate">evaluate</a></li>
   <li><a href="#axolotl.cli.main.fetch" id="toc-axolotl.cli.main.fetch" class="nav-link" data-scroll-target="#axolotl.cli.main.fetch">fetch</a></li>
   <li><a href="#axolotl.cli.main.inference" id="toc-axolotl.cli.main.inference" class="nav-link" data-scroll-target="#axolotl.cli.main.inference">inference</a></li>
@@ -824,9 +826,17 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </thead>
 <tbody>
 <tr class="odd">
+<td><a href="#axolotl.cli.main.agent_docs">agent_docs</a></td>
+<td>Show agent-optimized documentation.</td>
+</tr>
+<tr class="even">
 <td><a href="#axolotl.cli.main.cli">cli</a></td>
 <td>Axolotl CLI - Train and fine-tune large language models</td>
 </tr>
+<tr class="odd">
+<td><a href="#axolotl.cli.main.config_schema">config_schema</a></td>
+<td>Dump the full config JSON schema.</td>
+</tr>
 <tr class="even">
 <td><a href="#axolotl.cli.main.evaluate">evaluate</a></td>
 <td>Evaluate a model.</td>
@@ -857,14 +867,39 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </tr>
 </tbody>
 </table>
+<section id="axolotl.cli.main.agent_docs" class="level3">
+<h3 class="anchored" data-anchor-id="axolotl.cli.main.agent_docs">agent_docs</h3>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>cli.main.agent_docs(topic, list_topics)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<p>Show agent-optimized documentation.</p>
+<p>Prints reference docs designed for AI coding agents.
+These docs are bundled with the package — no network access needed.</p>
+<p>
+Examples:
+axolotl agent-docs # overview (start here)
+axolotl agent-docs grpo # GRPO reference
+axolotl agent-docs sft # SFT reference
+axolotl agent-docs –list # list all topics</p>
+</section>
 <section id="axolotl.cli.main.cli" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.cli">cli</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>cli.main.cli()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>cli.main.cli()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Axolotl CLI - Train and fine-tune large language models</p>
 </section>
+<section id="axolotl.cli.main.config_schema" class="level3">
+<h3 class="anchored" data-anchor-id="axolotl.cli.main.config_schema">config_schema</h3>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>cli.main.config_schema(output_format, field)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<p>Dump the full config JSON schema.</p>
+<p>Useful for AI agents and tooling to discover all available config options,
+their types, defaults, and descriptions.</p>
+<p>
+Examples:
+axolotl config-schema # full JSON schema
+axolotl config-schema –format yaml # YAML format
+axolotl config-schema –field adapter # single field</p>
+</section>
 <section id="axolotl.cli.main.evaluate" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.evaluate">evaluate</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>cli.main.evaluate(ctx, config, launcher, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>cli.main.evaluate(ctx, config, launcher, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Evaluate a model.</p>
 <section id="parameters" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4>
@@ -914,19 +949,20 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </section>
 <section id="axolotl.cli.main.fetch" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.fetch">fetch</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>cli.main.fetch(directory, dest)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>cli.main.fetch(directory, dest)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Fetch example configs or other resources.</p>
 <p>Available directories:
 - examples: Example configuration files
-- deepspeed_configs: DeepSpeed configuration files</p>
+- deepspeed_configs: DeepSpeed configuration files
+- docs: Full documentation (Quarto markdown files)</p>
 <section id="parameters-1" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4>
 <table class="caption-top table">
 <colgroup>
+<col style="width: 12%">
+<col style="width: 19%">
+<col style="width: 55%">
 <col style="width: 13%">
-<col style="width: 20%">
-<col style="width: 50%">
-<col style="width: 14%">
 </colgroup>
 <thead>
 <tr class="header">
@@ -940,7 +976,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 <tr class="odd">
 <td>directory</td>
 <td>str</td>
-<td>One of <code>examples</code>, <code>deepspeed_configs</code>.</td>
+<td>One of <code>examples</code>, <code>deepspeed_configs</code>, <code>docs</code>.</td>
 <td><em>required</em></td>
 </tr>
 <tr class="even">
@@ -955,7 +991,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </section>
 <section id="axolotl.cli.main.inference" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.inference">inference</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>cli.main.inference(ctx, config, launcher, gradio, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>cli.main.inference(ctx, config, launcher, gradio, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Run inference with a trained model.</p>
 <section id="parameters-2" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4>
@@ -1011,7 +1047,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </section>
 <section id="axolotl.cli.main.merge_lora" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.merge_lora">merge_lora</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>cli.main.merge_lora(config, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>cli.main.merge_lora(config, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Merge trained LoRA adapters into a base model.</p>
 <section id="parameters-3" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h4>
@@ -1049,7 +1085,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </section>
 <section id="axolotl.cli.main.merge_sharded_fsdp_weights" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.merge_sharded_fsdp_weights">merge_sharded_fsdp_weights</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>cli.main.merge_sharded_fsdp_weights(ctx, config, launcher, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>cli.main.merge_sharded_fsdp_weights(ctx, config, launcher, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Merge sharded FSDP model weights.</p>
 <section id="parameters-4" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4>
@@ -1099,7 +1135,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </section>
 <section id="axolotl.cli.main.preprocess" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.preprocess">preprocess</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>cli.main.preprocess(config, cloud<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>cli.main.preprocess(config, cloud<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Preprocess datasets before training.</p>
 <section id="parameters-5" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-5">Parameters</h4>
@@ -1143,14 +1179,14 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 </section>
 <section id="axolotl.cli.main.train" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.cli.main.train">train</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>cli.main.train(</span>
-<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>    ctx,</span>
-<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>    config,</span>
-<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>    launcher<span class="op">=</span><span class="st">'accelerate'</span>,</span>
-<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>    cloud<span class="op">=</span><span class="va">None</span>,</span>
-<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>    sweep<span class="op">=</span><span class="va">None</span>,</span>
-<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>    <span class="op">**</span>kwargs,</span>
-<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>cli.main.train(</span>
+<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>    ctx,</span>
+<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>    config,</span>
+<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>    launcher<span class="op">=</span><span class="st">'accelerate'</span>,</span>
+<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>    cloud<span class="op">=</span><span class="va">None</span>,</span>
+<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a>    sweep<span class="op">=</span><span class="va">None</span>,</span>
+<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>    <span class="op">**</span>kwargs,</span>
+<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Train or fine-tune a model.</p>
 <section id="parameters-6" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4>
diff --git a/docs/api/kernels.lora.html b/docs/api/kernels.lora.html
index 1aaf83d4d..a795aa243 100644
--- a/docs/api/kernels.lora.html
+++ b/docs/api/kernels.lora.html
@@ -793,6 +793,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
   <li><a href="#axolotl.kernels.lora.LoRA_Embedding" id="toc-axolotl.kernels.lora.LoRA_Embedding" class="nav-link" data-scroll-target="#axolotl.kernels.lora.LoRA_Embedding">LoRA_Embedding</a></li>
   <li><a href="#axolotl.kernels.lora.LoRA_MLP" id="toc-axolotl.kernels.lora.LoRA_MLP" class="nav-link" data-scroll-target="#axolotl.kernels.lora.LoRA_MLP">LoRA_MLP</a></li>
   <li><a href="#axolotl.kernels.lora.LoRA_O" id="toc-axolotl.kernels.lora.LoRA_O" class="nav-link" data-scroll-target="#axolotl.kernels.lora.LoRA_O">LoRA_O</a></li>
+  <li><a href="#axolotl.kernels.lora.LoRA_QK" id="toc-axolotl.kernels.lora.LoRA_QK" class="nav-link" data-scroll-target="#axolotl.kernels.lora.LoRA_QK">LoRA_QK</a></li>
   <li><a href="#axolotl.kernels.lora.LoRA_QKV" id="toc-axolotl.kernels.lora.LoRA_QKV" class="nav-link" data-scroll-target="#axolotl.kernels.lora.LoRA_QKV">LoRA_QKV</a></li>
   </ul></li>
   <li><a href="#functions" id="toc-functions" class="nav-link" data-scroll-target="#functions">Functions</a>
@@ -801,6 +802,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
   <li><a href="#axolotl.kernels.lora.apply_lora_mlp_geglu" id="toc-axolotl.kernels.lora.apply_lora_mlp_geglu" class="nav-link" data-scroll-target="#axolotl.kernels.lora.apply_lora_mlp_geglu">apply_lora_mlp_geglu</a></li>
   <li><a href="#axolotl.kernels.lora.apply_lora_mlp_swiglu" id="toc-axolotl.kernels.lora.apply_lora_mlp_swiglu" class="nav-link" data-scroll-target="#axolotl.kernels.lora.apply_lora_mlp_swiglu">apply_lora_mlp_swiglu</a></li>
   <li><a href="#axolotl.kernels.lora.apply_lora_o" id="toc-axolotl.kernels.lora.apply_lora_o" class="nav-link" data-scroll-target="#axolotl.kernels.lora.apply_lora_o">apply_lora_o</a></li>
+  <li><a href="#axolotl.kernels.lora.apply_lora_qk" id="toc-axolotl.kernels.lora.apply_lora_qk" class="nav-link" data-scroll-target="#axolotl.kernels.lora.apply_lora_qk">apply_lora_qk</a></li>
   <li><a href="#axolotl.kernels.lora.apply_lora_qkv" id="toc-axolotl.kernels.lora.apply_lora_qkv" class="nav-link" data-scroll-target="#axolotl.kernels.lora.apply_lora_qkv">apply_lora_qkv</a></li>
   <li><a href="#axolotl.kernels.lora.get_embedding_lora_parameters" id="toc-axolotl.kernels.lora.get_embedding_lora_parameters" class="nav-link" data-scroll-target="#axolotl.kernels.lora.get_embedding_lora_parameters">get_embedding_lora_parameters</a></li>
   <li><a href="#axolotl.kernels.lora.get_lora_parameters" id="toc-axolotl.kernels.lora.get_lora_parameters" class="nav-link" data-scroll-target="#axolotl.kernels.lora.get_lora_parameters">get_lora_parameters</a></li>
@@ -848,6 +850,10 @@ See “DoRA: Weight-Decomposed Low-Rank Adaptation” (https://arxiv.org/abs/240
 <td>Optimized LoRA implementation for output projection.</td>
 </tr>
 <tr class="even">
+<td><a href="#axolotl.kernels.lora.LoRA_QK">LoRA_QK</a></td>
+<td>Optimized LoRA QK implementation for models where v_proj is None.</td>
+</tr>
+<tr class="odd">
 <td><a href="#axolotl.kernels.lora.LoRA_QKV">LoRA_QKV</a></td>
 <td>Optimized LoRA QKV implementation with quantization support.</td>
 </tr>
@@ -873,9 +879,19 @@ as input, so dropout is not applied there.</p>
 <p>Optimized LoRA implementation for output projection.</p>
 <p>Supports bias, dropout, and DoRA.</p>
 </section>
+<section id="axolotl.kernels.lora.LoRA_QK" class="level3">
+<h3 class="anchored" data-anchor-id="axolotl.kernels.lora.LoRA_QK">LoRA_QK</h3>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.LoRA_QK()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<p>Optimized LoRA QK implementation for models where v_proj is None.</p>
+<p>Used by models like Gemma4 with attention_k_eq_v=True, where key states are
+reused as value states. Only Q and K projections are fused; the caller
+returns K a second time as V so that autograd accumulates key+value gradients
+into a single dK.</p>
+<p>Supports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).</p>
+</section>
 <section id="axolotl.kernels.lora.LoRA_QKV" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.LoRA_QKV">LoRA_QKV</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.LoRA_QKV()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.LoRA_QKV()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Optimized LoRA QKV implementation with quantization support.</p>
 <p>Supports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).
 Dropout is applied outside this Function so autograd handles its backward.</p>
@@ -908,18 +924,22 @@ Dropout is applied outside this Function so autograd handles its backward.</p>
 <td>Applies LoRA to output projection layer.</td>
 </tr>
 <tr class="odd">
+<td><a href="#axolotl.kernels.lora.apply_lora_qk">apply_lora_qk</a></td>
+<td>Applies LoRA to compute Query and Key projections for models where v_proj is None.</td>
+</tr>
+<tr class="even">
 <td><a href="#axolotl.kernels.lora.apply_lora_qkv">apply_lora_qkv</a></td>
 <td>Applies LoRA to compute Query, Key, Value projections.</td>
 </tr>
-<tr class="even">
+<tr class="odd">
 <td><a href="#axolotl.kernels.lora.get_embedding_lora_parameters">get_embedding_lora_parameters</a></td>
 <td>Extract LoRA parameters from a PEFT Embedding module.</td>
 </tr>
-<tr class="odd">
+<tr class="even">
 <td><a href="#axolotl.kernels.lora.get_lora_parameters">get_lora_parameters</a></td>
 <td>Gets LoRA parameters from a projection module.</td>
 </tr>
-<tr class="even">
+<tr class="odd">
 <td><a href="#axolotl.kernels.lora.matmul_lora">matmul_lora</a></td>
 <td>Efficient fused matmul + LoRA computation.</td>
 </tr>
@@ -927,30 +947,40 @@ Dropout is applied outside this Function so autograd handles its backward.</p>
 </table>
 <section id="axolotl.kernels.lora.apply_lora_embedding" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.apply_lora_embedding">apply_lora_embedding</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_embedding(<span class="va">self</span>, x)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_embedding(<span class="va">self</span>, x)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Applies LoRA to embedding layer.</p>
 </section>
 <section id="axolotl.kernels.lora.apply_lora_mlp_geglu" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.apply_lora_mlp_geglu">apply_lora_mlp_geglu</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_mlp_geglu(<span class="va">self</span>, X, inplace<span class="op">=</span><span class="va">True</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_mlp_geglu(<span class="va">self</span>, X, inplace<span class="op">=</span><span class="va">True</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Applies LoRA to MLP layer with GEGLU activation.</p>
 <p>Supports bias, dropout, and DoRA.</p>
 </section>
 <section id="axolotl.kernels.lora.apply_lora_mlp_swiglu" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.apply_lora_mlp_swiglu">apply_lora_mlp_swiglu</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_mlp_swiglu(<span class="va">self</span>, X, inplace<span class="op">=</span><span class="va">True</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_mlp_swiglu(<span class="va">self</span>, X, inplace<span class="op">=</span><span class="va">True</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Applies LoRA to MLP layer with SwiGLU activation.</p>
 <p>Supports bias, dropout, and DoRA.</p>
 </section>
 <section id="axolotl.kernels.lora.apply_lora_o" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.apply_lora_o">apply_lora_o</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_o(<span class="va">self</span>, X)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_o(<span class="va">self</span>, X)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Applies LoRA to output projection layer.</p>
 <p>Supports bias, dropout, and DoRA.</p>
 </section>
+<section id="axolotl.kernels.lora.apply_lora_qk" class="level3">
+<h3 class="anchored" data-anchor-id="axolotl.kernels.lora.apply_lora_qk">apply_lora_qk</h3>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_qk(<span class="va">self</span>, X, inplace<span class="op">=</span><span class="va">True</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<p>Applies LoRA to compute Query and Key projections for models where v_proj is None.</p>
+<p>When v_proj is None (e.g.&nbsp;Gemma4 attention_k_eq_v), key states are reused as
+value states. Returns (Q, K, K) — the caller’s patched forward will use K as V.
+Because K is returned twice, autograd accumulates gradients from both the key and
+value paths into dK before calling LoRA_QK.backward.</p>
+<p>Supports bias, dropout, and DoRA.</p>
+</section>
 <section id="axolotl.kernels.lora.apply_lora_qkv" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.apply_lora_qkv">apply_lora_qkv</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_qkv(<span class="va">self</span>, X, inplace<span class="op">=</span><span class="va">True</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.apply_lora_qkv(<span class="va">self</span>, X, inplace<span class="op">=</span><span class="va">True</span>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Applies LoRA to compute Query, Key, Value projections.</p>
 <p>Supports bias, dropout, and DoRA. Dropout is applied outside the autograd
 Function so PyTorch handles its backward automatically. A single shared
@@ -958,12 +988,12 @@ dropout mask is used across Q, K, V projections for memory efficiency.</p>
 </section>
 <section id="axolotl.kernels.lora.get_embedding_lora_parameters" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.get_embedding_lora_parameters">get_embedding_lora_parameters</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.get_embedding_lora_parameters(embed)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.get_embedding_lora_parameters(embed)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Extract LoRA parameters from a PEFT Embedding module.</p>
 </section>
 <section id="axolotl.kernels.lora.get_lora_parameters" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.get_lora_parameters">get_lora_parameters</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.get_lora_parameters(proj)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.get_lora_parameters(proj)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Gets LoRA parameters from a projection module.</p>
 <section id="parameters" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4>
@@ -1064,18 +1094,18 @@ dropout mask is used across Q, K, V projections for memory efficiency.</p>
 </section>
 <section id="axolotl.kernels.lora.matmul_lora" class="level3">
 <h3 class="anchored" data-anchor-id="axolotl.kernels.lora.matmul_lora">matmul_lora</h3>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.matmul_lora(</span>
-<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>    X,</span>
-<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>    W,</span>
-<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>    b,</span>
-<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>    W_quant,</span>
-<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>    A,</span>
-<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>    B,</span>
-<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>    s,</span>
-<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>    out<span class="op">=</span><span class="va">None</span>,</span>
-<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a>    X_drop<span class="op">=</span><span class="va">None</span>,</span>
-<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>    lora_bias<span class="op">=</span><span class="va">None</span>,</span>
-<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>kernels.lora.matmul_lora(</span>
+<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>    X,</span>
+<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>    W,</span>
+<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>    b,</span>
+<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>    W_quant,</span>
+<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>    A,</span>
+<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>    B,</span>
+<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>    s,</span>
+<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>    out<span class="op">=</span><span class="va">None</span>,</span>
+<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>    X_drop<span class="op">=</span><span class="va">None</span>,</span>
+<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a>    lora_bias<span class="op">=</span><span class="va">None</span>,</span>
+<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Efficient fused matmul + LoRA computation.</p>
 <section id="parameters-1" class="level4 doc-section doc-section-parameters">
 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4>
diff --git a/docs/custom_integrations.html b/docs/custom_integrations.html
index ac605e33e..792579bea 100644
--- a/docs/custom_integrations.html
+++ b/docs/custom_integrations.html
@@ -1025,7 +1025,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 <ul>
 <li>If you are installing from pip</li>
 </ul>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip3</span> uninstall <span class="at">-y</span> cut-cross-entropy <span class="kw">&amp;&amp;</span> <span class="ex">pip3</span> install <span class="st">"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@63b15e6"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip3</span> uninstall <span class="at">-y</span> cut-cross-entropy <span class="kw">&amp;&amp;</span> <span class="ex">pip3</span> install <span class="st">"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </section>
 <section id="usage" class="level3">
 <h3 class="anchored" data-anchor-id="usage">Usage</h3>
@@ -1048,6 +1048,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 <li>gemma3_text</li>
 <li>gemma3n</li>
 <li>gemma3n_text</li>
+<li>gemma4</li>
 <li>glm</li>
 <li>glm4</li>
 <li>glm4_moe</li>
@@ -1689,9 +1690,6 @@ The quick brown fox jumps over the loud dog</code></pre>
 <li><strong>128 experts, top-k=8</strong> for the 26B-A4B variant.</li>
 </ul>
 <p>Because there is no SparseMoeBlock class to patch, Gemma 4 uses a different integration path: we register <code>"scattermoe"</code> as a custom implementation in the transformers <code>ExpertsInterface</code>, and set <code>experts_implementation: scattermoe</code> in the config. The <code>@use_experts_implementation</code> decorator on <code>Gemma4TextExperts</code> then dispatches to our ScatterMoE kernel automatically. The router is untouched — it runs as-is.</p>
-<p><strong>Important limitations:</strong>
-- <strong>Flash Attention 2 is not supported</strong> — Gemma 4 uses <code>global_head_dim: 512</code> for full attention layers, which exceeds FA2’s maximum head dimension of 256. Use <code>sdp_attention: true</code> instead.
-- <strong>Multimodal model</strong>: Gemma 4 includes vision and audio encoders. For text-only SFT, use <code>lora_target_linear_modules</code> with a regex to restrict LoRA to the text backbone (e.g.&nbsp;<code>language_model\.model\.layers\.\d+\.self_attn\.(q|k|v|o)_proj</code>).</p>
 </section>
 <section id="limitations-1" class="level3">
 <h3 class="anchored" data-anchor-id="limitations-1">Limitations</h3>
diff --git a/examples/colab-notebooks/colab-axolotl-example.html b/examples/colab-notebooks/colab-axolotl-example.html
index 0c77e2b27..49be9a4b0 100644
--- a/examples/colab-notebooks/colab-axolotl-example.html
+++ b/examples/colab-notebooks/colab-axolotl-example.html
@@ -842,7 +842,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 <div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="op">%%</span>capture</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="co"># This step can take ~5-10 minutes to install dependencies</span></span>
 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install <span class="op">--</span>no<span class="op">-</span>build<span class="op">-</span>isolation axolotl[flash<span class="op">-</span>attn]<span class="op">&gt;=</span><span class="fl">0.9.1</span></span>
-<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install <span class="st">"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@63b15e6"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install <span class="st">"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88"</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </div>
 <section id="demo-talk-like-a-pirate" class="level2">
 <h2 class="anchored" data-anchor-id="demo-talk-like-a-pirate">Demo: Talk Like a Pirate</h2>
diff --git a/index.html b/index.html
index d6c860000..e4ce9f6b6 100644
--- a/index.html
+++ b/index.html
@@ -795,6 +795,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
   <li><a href="#your-first-fine-tune" id="toc-your-first-fine-tune" class="nav-link" data-scroll-target="#your-first-fine-tune">Your First Fine-tune</a></li>
   </ul></li>
   <li><a href="#documentation" id="toc-documentation" class="nav-link" data-scroll-target="#documentation">📚 Documentation</a></li>
+  <li><a href="#ai-agent-support" id="toc-ai-agent-support" class="nav-link" data-scroll-target="#ai-agent-support">AI Agent Support</a></li>
   <li><a href="#getting-help" id="toc-getting-help" class="nav-link" data-scroll-target="#getting-help">🤝 Getting Help</a></li>
   <li><a href="#contributing" id="toc-contributing" class="nav-link" data-scroll-target="#contributing">🌟 Contributing</a></li>
   <li><a href="#telemetry" id="toc-telemetry" class="nav-link" data-scroll-target="#telemetry">📈 Telemetry</a></li>
@@ -976,6 +977,25 @@ Expand older updates
 <li><a href="https://docs.axolotl.ai/docs/faq.html">FAQ</a> - Frequently asked questions</li>
 </ul>
 </section>
+<section id="ai-agent-support" class="level2">
+<h2 class="anchored" data-anchor-id="ai-agent-support">AI Agent Support</h2>
+<p>Axolotl ships with built-in documentation optimized for AI coding agents (Claude Code, Cursor, Copilot, etc.). These docs are bundled with the pip package — no repo clone needed.</p>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Show overview and available training methods</span></span>
+<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs</span>
+<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Topic-specific references</span></span>
+<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs sft                 <span class="co"># supervised fine-tuning</span></span>
+<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs grpo                <span class="co"># GRPO online RL</span></span>
+<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs preference_tuning   <span class="co"># DPO, KTO, ORPO, SimPO</span></span>
+<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs reward_modelling    <span class="co"># outcome and process reward models</span></span>
+<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs pretraining         <span class="co"># continual pretraining</span></span>
+<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> agent-docs <span class="at">--list</span>              <span class="co"># list all topics</span></span>
+<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Dump config schema for programmatic use</span></span>
+<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> config-schema</span>
+<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> config-schema <span class="at">--field</span> adapter</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<p>If you’re working with the source repo, agent docs are also available at <code>docs/agents/</code> and the project overview is in <code>AGENTS.md</code>.</p>
+</section>
 <section id="getting-help" class="level2">
 <h2 class="anchored" data-anchor-id="getting-help">🤝 Getting Help</h2>
 <ul>
@@ -1003,13 +1023,13 @@ disable it, set AXOLOTL_DO_NOT_TRACK=1. For more details, see our <a href="https
 <section id="citing-axolotl" class="level2">
 <h2 class="anchored" data-anchor-id="citing-axolotl">📝 Citing Axolotl</h2>
 <p>If you use Axolotl in your research or projects, please cite it as follows:</p>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode bibtex code-with-copy"><code class="sourceCode bibtex"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co">@software{axolotl,</span></span>
-<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="co">  title = {Axolotl: Open Source LLM Post-Training},</span></span>
-<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co">  author = {{Axolotl maintainers and contributors}},</span></span>
-<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co">  url = {https://github.com/axolotl-ai-cloud/axolotl},</span></span>
-<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="co">  license = {Apache-2.0},</span></span>
-<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="co">  year = {2023}</span></span>
-<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="co">}</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode bibtex code-with-copy"><code class="sourceCode bibtex"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co">@software{axolotl,</span></span>
+<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="co">  title = {Axolotl: Open Source LLM Post-Training},</span></span>
+<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co">  author = {{Axolotl maintainers and contributors}},</span></span>
+<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="co">  url = {https://github.com/axolotl-ai-cloud/axolotl},</span></span>
+<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="co">  license = {Apache-2.0},</span></span>
+<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="co">  year = {2023}</span></span>
+<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="co">}</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </section>
 <section id="license" class="level2">
 <h2 class="anchored" data-anchor-id="license">📜 License</h2>
diff --git a/search.json b/search.json
index eda44af55..50682912a 100644
--- a/search.json
+++ b/search.json
@@ -629,21 +629,21 @@
     "href": "docs/api/kernels.lora.html",
     "title": "kernels.lora",
     "section": "",
-    "text": "kernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\nSee “LoRA: Low-Rank Adaptation of Large Language Models”\n(https://arxiv.org/abs/2106.09685).\nAlso supports DoRA (Weight-Decomposed Low-Rank Adaptation):\nSee “DoRA: Weight-Decomposed Low-Rank Adaptation” (https://arxiv.org/abs/2402.09353).\nCredit to unsloth (https://unsloth.ai/) for inspiration for this implementation.\n\n\n\n\n\nName\nDescription\n\n\n\n\nLoRA_Embedding\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\n\n\nLoRA_MLP\nOptimized LoRA MLP implementation.\n\n\nLoRA_O\nOptimized LoRA implementation for output projection.\n\n\nLoRA_QKV\nOptimized LoRA QKV implementation with quantization support.\n\n\n\n\n\nkernels.lora.LoRA_Embedding()\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\nSupports dropout and DoRA.\n\n\n\nkernels.lora.LoRA_MLP()\nOptimized LoRA MLP implementation.\nSupports bias, dropout, and DoRA. Dropout is applied to the input for\ngate/up projections. The down projection uses hidden states (post-activation)\nas input, so dropout is not applied there.\n\n\n\nkernels.lora.LoRA_O()\nOptimized LoRA implementation for output projection.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.LoRA_QKV()\nOptimized LoRA QKV implementation with quantization support.\nSupports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).\nDropout is applied outside this Function so autograd handles its backward.\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_lora_embedding\nApplies LoRA to embedding layer.\n\n\napply_lora_mlp_geglu\nApplies LoRA to MLP layer with GEGLU activation.\n\n\napply_lora_mlp_swiglu\nApplies LoRA to MLP layer with SwiGLU activation.\n\n\napply_lora_o\nApplies LoRA to output projection layer.\n\n\napply_lora_qkv\nApplies LoRA to compute Query, Key, Value projections.\n\n\nget_embedding_lora_parameters\nExtract LoRA parameters from a PEFT Embedding module.\n\n\nget_lora_parameters\nGets LoRA parameters from a projection module.\n\n\nmatmul_lora\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\nkernels.lora.apply_lora_embedding(self, x)\nApplies LoRA to embedding layer.\n\n\n\nkernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)\nApplies LoRA to MLP layer with GEGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)\nApplies LoRA to MLP layer with SwiGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_o(self, X)\nApplies LoRA to output projection layer.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_qkv(self, X, inplace=True)\nApplies LoRA to compute Query, Key, Value projections.\nSupports bias, dropout, and DoRA. Dropout is applied outside the autograd\nFunction so PyTorch handles its backward automatically. A single shared\ndropout mask is used across Q, K, V projections for memory efficiency.\n\n\n\nkernels.lora.get_embedding_lora_parameters(embed)\nExtract LoRA parameters from a PEFT Embedding module.\n\n\n\nkernels.lora.get_lora_parameters(proj)\nGets LoRA parameters from a projection module.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nproj\nnn.Module\nThe projection module to extract parameters from.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nA tuple containing:\n\n\n\ntorch.Tensor | None\n- W: base weight tensor\n\n\n\nQuantState | torch.Tensor | None\n- b: base layer bias (or None)\n\n\n\ntorch.Tensor | None\n- quant_state: quantization state (or None)\n\n\n\ntorch.Tensor | None\n- A: LoRA A weight (or None)\n\n\n\nfloat | None\n- B: LoRA B weight (or None)\n\n\n\ntorch.Tensor | None\n- s: LoRA scaling factor (or None)\n\n\n\nnn.Module | None\n- lora_bias: LoRA B bias (or None)\n\n\n\ntorch.Tensor | None\n- dropout: dropout module (or None)\n\n\n\ntuple[torch.Tensor, torch.Tensor | None, QuantState | torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, float | None, torch.Tensor | None, nn.Module | None, torch.Tensor | None]\n- magnitude: DoRA magnitude vector (or None)\n\n\n\n\n\n\n\nkernels.lora.matmul_lora(\n    X,\n    W,\n    b,\n    W_quant,\n    A,\n    B,\n    s,\n    out=None,\n    X_drop=None,\n    lora_bias=None,\n)\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nX\ntorch.Tensor\nInput tensor [*, in_features]\nrequired\n\n\nW\ntorch.Tensor\nBase weight matrix [out_features, in_features]\nrequired\n\n\nW_quant\nQuantState | torch.Tensor | None\nQuantization state for W\nrequired\n\n\nA\ntorch.Tensor | None\nLoRA A matrix [rank, in_features]\nrequired\n\n\nB\ntorch.Tensor | None\nLoRA B matrix [out_features, rank]\nrequired\n\n\ns\nfloat | None\nLoRA scaling factor\nrequired\n\n\nout\ntorch.Tensor | None\nOptional output tensor for inplace operations\nNone\n\n\nX_drop\ntorch.Tensor | None\nOptional dropout-applied input for LoRA path (if None, uses X)\nNone\n\n\nlora_bias\ntorch.Tensor | None\nOptional LoRA B layer bias [out_features]\nNone\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nResult of X @ W + s * X_drop @ A @ B + b + s * lora_bias"
+    "text": "kernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\nSee “LoRA: Low-Rank Adaptation of Large Language Models”\n(https://arxiv.org/abs/2106.09685).\nAlso supports DoRA (Weight-Decomposed Low-Rank Adaptation):\nSee “DoRA: Weight-Decomposed Low-Rank Adaptation” (https://arxiv.org/abs/2402.09353).\nCredit to unsloth (https://unsloth.ai/) for inspiration for this implementation.\n\n\n\n\n\nName\nDescription\n\n\n\n\nLoRA_Embedding\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\n\n\nLoRA_MLP\nOptimized LoRA MLP implementation.\n\n\nLoRA_O\nOptimized LoRA implementation for output projection.\n\n\nLoRA_QK\nOptimized LoRA QK implementation for models where v_proj is None.\n\n\nLoRA_QKV\nOptimized LoRA QKV implementation with quantization support.\n\n\n\n\n\nkernels.lora.LoRA_Embedding()\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\nSupports dropout and DoRA.\n\n\n\nkernels.lora.LoRA_MLP()\nOptimized LoRA MLP implementation.\nSupports bias, dropout, and DoRA. Dropout is applied to the input for\ngate/up projections. The down projection uses hidden states (post-activation)\nas input, so dropout is not applied there.\n\n\n\nkernels.lora.LoRA_O()\nOptimized LoRA implementation for output projection.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.LoRA_QK()\nOptimized LoRA QK implementation for models where v_proj is None.\nUsed by models like Gemma4 with attention_k_eq_v=True, where key states are\nreused as value states. Only Q and K projections are fused; the caller\nreturns K a second time as V so that autograd accumulates key+value gradients\ninto a single dK.\nSupports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).\n\n\n\nkernels.lora.LoRA_QKV()\nOptimized LoRA QKV implementation with quantization support.\nSupports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).\nDropout is applied outside this Function so autograd handles its backward.\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_lora_embedding\nApplies LoRA to embedding layer.\n\n\napply_lora_mlp_geglu\nApplies LoRA to MLP layer with GEGLU activation.\n\n\napply_lora_mlp_swiglu\nApplies LoRA to MLP layer with SwiGLU activation.\n\n\napply_lora_o\nApplies LoRA to output projection layer.\n\n\napply_lora_qk\nApplies LoRA to compute Query and Key projections for models where v_proj is None.\n\n\napply_lora_qkv\nApplies LoRA to compute Query, Key, Value projections.\n\n\nget_embedding_lora_parameters\nExtract LoRA parameters from a PEFT Embedding module.\n\n\nget_lora_parameters\nGets LoRA parameters from a projection module.\n\n\nmatmul_lora\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\nkernels.lora.apply_lora_embedding(self, x)\nApplies LoRA to embedding layer.\n\n\n\nkernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)\nApplies LoRA to MLP layer with GEGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)\nApplies LoRA to MLP layer with SwiGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_o(self, X)\nApplies LoRA to output projection layer.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_qk(self, X, inplace=True)\nApplies LoRA to compute Query and Key projections for models where v_proj is None.\nWhen v_proj is None (e.g. Gemma4 attention_k_eq_v), key states are reused as\nvalue states. Returns (Q, K, K) — the caller’s patched forward will use K as V.\nBecause K is returned twice, autograd accumulates gradients from both the key and\nvalue paths into dK before calling LoRA_QK.backward.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_qkv(self, X, inplace=True)\nApplies LoRA to compute Query, Key, Value projections.\nSupports bias, dropout, and DoRA. Dropout is applied outside the autograd\nFunction so PyTorch handles its backward automatically. A single shared\ndropout mask is used across Q, K, V projections for memory efficiency.\n\n\n\nkernels.lora.get_embedding_lora_parameters(embed)\nExtract LoRA parameters from a PEFT Embedding module.\n\n\n\nkernels.lora.get_lora_parameters(proj)\nGets LoRA parameters from a projection module.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nproj\nnn.Module\nThe projection module to extract parameters from.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nA tuple containing:\n\n\n\ntorch.Tensor | None\n- W: base weight tensor\n\n\n\nQuantState | torch.Tensor | None\n- b: base layer bias (or None)\n\n\n\ntorch.Tensor | None\n- quant_state: quantization state (or None)\n\n\n\ntorch.Tensor | None\n- A: LoRA A weight (or None)\n\n\n\nfloat | None\n- B: LoRA B weight (or None)\n\n\n\ntorch.Tensor | None\n- s: LoRA scaling factor (or None)\n\n\n\nnn.Module | None\n- lora_bias: LoRA B bias (or None)\n\n\n\ntorch.Tensor | None\n- dropout: dropout module (or None)\n\n\n\ntuple[torch.Tensor, torch.Tensor | None, QuantState | torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, float | None, torch.Tensor | None, nn.Module | None, torch.Tensor | None]\n- magnitude: DoRA magnitude vector (or None)\n\n\n\n\n\n\n\nkernels.lora.matmul_lora(\n    X,\n    W,\n    b,\n    W_quant,\n    A,\n    B,\n    s,\n    out=None,\n    X_drop=None,\n    lora_bias=None,\n)\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nX\ntorch.Tensor\nInput tensor [*, in_features]\nrequired\n\n\nW\ntorch.Tensor\nBase weight matrix [out_features, in_features]\nrequired\n\n\nW_quant\nQuantState | torch.Tensor | None\nQuantization state for W\nrequired\n\n\nA\ntorch.Tensor | None\nLoRA A matrix [rank, in_features]\nrequired\n\n\nB\ntorch.Tensor | None\nLoRA B matrix [out_features, rank]\nrequired\n\n\ns\nfloat | None\nLoRA scaling factor\nrequired\n\n\nout\ntorch.Tensor | None\nOptional output tensor for inplace operations\nNone\n\n\nX_drop\ntorch.Tensor | None\nOptional dropout-applied input for LoRA path (if None, uses X)\nNone\n\n\nlora_bias\ntorch.Tensor | None\nOptional LoRA B layer bias [out_features]\nNone\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nResult of X @ W + s * X_drop @ A @ B + b + s * lora_bias"
   },
   {
     "objectID": "docs/api/kernels.lora.html#classes",
     "href": "docs/api/kernels.lora.html#classes",
     "title": "kernels.lora",
     "section": "",
-    "text": "Name\nDescription\n\n\n\n\nLoRA_Embedding\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\n\n\nLoRA_MLP\nOptimized LoRA MLP implementation.\n\n\nLoRA_O\nOptimized LoRA implementation for output projection.\n\n\nLoRA_QKV\nOptimized LoRA QKV implementation with quantization support.\n\n\n\n\n\nkernels.lora.LoRA_Embedding()\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\nSupports dropout and DoRA.\n\n\n\nkernels.lora.LoRA_MLP()\nOptimized LoRA MLP implementation.\nSupports bias, dropout, and DoRA. Dropout is applied to the input for\ngate/up projections. The down projection uses hidden states (post-activation)\nas input, so dropout is not applied there.\n\n\n\nkernels.lora.LoRA_O()\nOptimized LoRA implementation for output projection.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.LoRA_QKV()\nOptimized LoRA QKV implementation with quantization support.\nSupports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).\nDropout is applied outside this Function so autograd handles its backward."
+    "text": "Name\nDescription\n\n\n\n\nLoRA_Embedding\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\n\n\nLoRA_MLP\nOptimized LoRA MLP implementation.\n\n\nLoRA_O\nOptimized LoRA implementation for output projection.\n\n\nLoRA_QK\nOptimized LoRA QK implementation for models where v_proj is None.\n\n\nLoRA_QKV\nOptimized LoRA QKV implementation with quantization support.\n\n\n\n\n\nkernels.lora.LoRA_Embedding()\nFused LoRA embedding: F.embedding(x, W) + s * F.embedding(x, A^T) @ B^T.\nSupports dropout and DoRA.\n\n\n\nkernels.lora.LoRA_MLP()\nOptimized LoRA MLP implementation.\nSupports bias, dropout, and DoRA. Dropout is applied to the input for\ngate/up projections. The down projection uses hidden states (post-activation)\nas input, so dropout is not applied there.\n\n\n\nkernels.lora.LoRA_O()\nOptimized LoRA implementation for output projection.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.LoRA_QK()\nOptimized LoRA QK implementation for models where v_proj is None.\nUsed by models like Gemma4 with attention_k_eq_v=True, where key states are\nreused as value states. Only Q and K projections are fused; the caller\nreturns K a second time as V so that autograd accumulates key+value gradients\ninto a single dK.\nSupports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).\n\n\n\nkernels.lora.LoRA_QKV()\nOptimized LoRA QKV implementation with quantization support.\nSupports bias, dropout, and DoRA (Weight-Decomposed Low-Rank Adaptation).\nDropout is applied outside this Function so autograd handles its backward."
   },
   {
     "objectID": "docs/api/kernels.lora.html#functions",
     "href": "docs/api/kernels.lora.html#functions",
     "title": "kernels.lora",
     "section": "",
-    "text": "Name\nDescription\n\n\n\n\napply_lora_embedding\nApplies LoRA to embedding layer.\n\n\napply_lora_mlp_geglu\nApplies LoRA to MLP layer with GEGLU activation.\n\n\napply_lora_mlp_swiglu\nApplies LoRA to MLP layer with SwiGLU activation.\n\n\napply_lora_o\nApplies LoRA to output projection layer.\n\n\napply_lora_qkv\nApplies LoRA to compute Query, Key, Value projections.\n\n\nget_embedding_lora_parameters\nExtract LoRA parameters from a PEFT Embedding module.\n\n\nget_lora_parameters\nGets LoRA parameters from a projection module.\n\n\nmatmul_lora\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\nkernels.lora.apply_lora_embedding(self, x)\nApplies LoRA to embedding layer.\n\n\n\nkernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)\nApplies LoRA to MLP layer with GEGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)\nApplies LoRA to MLP layer with SwiGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_o(self, X)\nApplies LoRA to output projection layer.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_qkv(self, X, inplace=True)\nApplies LoRA to compute Query, Key, Value projections.\nSupports bias, dropout, and DoRA. Dropout is applied outside the autograd\nFunction so PyTorch handles its backward automatically. A single shared\ndropout mask is used across Q, K, V projections for memory efficiency.\n\n\n\nkernels.lora.get_embedding_lora_parameters(embed)\nExtract LoRA parameters from a PEFT Embedding module.\n\n\n\nkernels.lora.get_lora_parameters(proj)\nGets LoRA parameters from a projection module.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nproj\nnn.Module\nThe projection module to extract parameters from.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nA tuple containing:\n\n\n\ntorch.Tensor | None\n- W: base weight tensor\n\n\n\nQuantState | torch.Tensor | None\n- b: base layer bias (or None)\n\n\n\ntorch.Tensor | None\n- quant_state: quantization state (or None)\n\n\n\ntorch.Tensor | None\n- A: LoRA A weight (or None)\n\n\n\nfloat | None\n- B: LoRA B weight (or None)\n\n\n\ntorch.Tensor | None\n- s: LoRA scaling factor (or None)\n\n\n\nnn.Module | None\n- lora_bias: LoRA B bias (or None)\n\n\n\ntorch.Tensor | None\n- dropout: dropout module (or None)\n\n\n\ntuple[torch.Tensor, torch.Tensor | None, QuantState | torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, float | None, torch.Tensor | None, nn.Module | None, torch.Tensor | None]\n- magnitude: DoRA magnitude vector (or None)\n\n\n\n\n\n\n\nkernels.lora.matmul_lora(\n    X,\n    W,\n    b,\n    W_quant,\n    A,\n    B,\n    s,\n    out=None,\n    X_drop=None,\n    lora_bias=None,\n)\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nX\ntorch.Tensor\nInput tensor [*, in_features]\nrequired\n\n\nW\ntorch.Tensor\nBase weight matrix [out_features, in_features]\nrequired\n\n\nW_quant\nQuantState | torch.Tensor | None\nQuantization state for W\nrequired\n\n\nA\ntorch.Tensor | None\nLoRA A matrix [rank, in_features]\nrequired\n\n\nB\ntorch.Tensor | None\nLoRA B matrix [out_features, rank]\nrequired\n\n\ns\nfloat | None\nLoRA scaling factor\nrequired\n\n\nout\ntorch.Tensor | None\nOptional output tensor for inplace operations\nNone\n\n\nX_drop\ntorch.Tensor | None\nOptional dropout-applied input for LoRA path (if None, uses X)\nNone\n\n\nlora_bias\ntorch.Tensor | None\nOptional LoRA B layer bias [out_features]\nNone\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nResult of X @ W + s * X_drop @ A @ B + b + s * lora_bias"
+    "text": "Name\nDescription\n\n\n\n\napply_lora_embedding\nApplies LoRA to embedding layer.\n\n\napply_lora_mlp_geglu\nApplies LoRA to MLP layer with GEGLU activation.\n\n\napply_lora_mlp_swiglu\nApplies LoRA to MLP layer with SwiGLU activation.\n\n\napply_lora_o\nApplies LoRA to output projection layer.\n\n\napply_lora_qk\nApplies LoRA to compute Query and Key projections for models where v_proj is None.\n\n\napply_lora_qkv\nApplies LoRA to compute Query, Key, Value projections.\n\n\nget_embedding_lora_parameters\nExtract LoRA parameters from a PEFT Embedding module.\n\n\nget_lora_parameters\nGets LoRA parameters from a projection module.\n\n\nmatmul_lora\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\nkernels.lora.apply_lora_embedding(self, x)\nApplies LoRA to embedding layer.\n\n\n\nkernels.lora.apply_lora_mlp_geglu(self, X, inplace=True)\nApplies LoRA to MLP layer with GEGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_mlp_swiglu(self, X, inplace=True)\nApplies LoRA to MLP layer with SwiGLU activation.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_o(self, X)\nApplies LoRA to output projection layer.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_qk(self, X, inplace=True)\nApplies LoRA to compute Query and Key projections for models where v_proj is None.\nWhen v_proj is None (e.g. Gemma4 attention_k_eq_v), key states are reused as\nvalue states. Returns (Q, K, K) — the caller’s patched forward will use K as V.\nBecause K is returned twice, autograd accumulates gradients from both the key and\nvalue paths into dK before calling LoRA_QK.backward.\nSupports bias, dropout, and DoRA.\n\n\n\nkernels.lora.apply_lora_qkv(self, X, inplace=True)\nApplies LoRA to compute Query, Key, Value projections.\nSupports bias, dropout, and DoRA. Dropout is applied outside the autograd\nFunction so PyTorch handles its backward automatically. A single shared\ndropout mask is used across Q, K, V projections for memory efficiency.\n\n\n\nkernels.lora.get_embedding_lora_parameters(embed)\nExtract LoRA parameters from a PEFT Embedding module.\n\n\n\nkernels.lora.get_lora_parameters(proj)\nGets LoRA parameters from a projection module.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nproj\nnn.Module\nThe projection module to extract parameters from.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nA tuple containing:\n\n\n\ntorch.Tensor | None\n- W: base weight tensor\n\n\n\nQuantState | torch.Tensor | None\n- b: base layer bias (or None)\n\n\n\ntorch.Tensor | None\n- quant_state: quantization state (or None)\n\n\n\ntorch.Tensor | None\n- A: LoRA A weight (or None)\n\n\n\nfloat | None\n- B: LoRA B weight (or None)\n\n\n\ntorch.Tensor | None\n- s: LoRA scaling factor (or None)\n\n\n\nnn.Module | None\n- lora_bias: LoRA B bias (or None)\n\n\n\ntorch.Tensor | None\n- dropout: dropout module (or None)\n\n\n\ntuple[torch.Tensor, torch.Tensor | None, QuantState | torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, float | None, torch.Tensor | None, nn.Module | None, torch.Tensor | None]\n- magnitude: DoRA magnitude vector (or None)\n\n\n\n\n\n\n\nkernels.lora.matmul_lora(\n    X,\n    W,\n    b,\n    W_quant,\n    A,\n    B,\n    s,\n    out=None,\n    X_drop=None,\n    lora_bias=None,\n)\nEfficient fused matmul + LoRA computation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nX\ntorch.Tensor\nInput tensor [*, in_features]\nrequired\n\n\nW\ntorch.Tensor\nBase weight matrix [out_features, in_features]\nrequired\n\n\nW_quant\nQuantState | torch.Tensor | None\nQuantization state for W\nrequired\n\n\nA\ntorch.Tensor | None\nLoRA A matrix [rank, in_features]\nrequired\n\n\nB\ntorch.Tensor | None\nLoRA B matrix [out_features, rank]\nrequired\n\n\ns\nfloat | None\nLoRA scaling factor\nrequired\n\n\nout\ntorch.Tensor | None\nOptional output tensor for inplace operations\nNone\n\n\nX_drop\ntorch.Tensor | None\nOptional dropout-applied input for LoRA path (if None, uses X)\nNone\n\n\nlora_bias\ntorch.Tensor | None\nOptional LoRA B layer bias [out_features]\nNone\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nResult of X @ W + s * X_drop @ A @ B + b + s * lora_bias"
   },
   {
     "objectID": "docs/api/monkeypatch.utils.html",
@@ -3269,6 +3269,16 @@
       "Home"
     ]
   },
+  {
+    "objectID": "index.html#ai-agent-support",
+    "href": "index.html#ai-agent-support",
+    "title": "Axolotl",
+    "section": "AI Agent Support",
+    "text": "AI Agent Support\nAxolotl ships with built-in documentation optimized for AI coding agents (Claude Code, Cursor, Copilot, etc.). These docs are bundled with the pip package — no repo clone needed.\n# Show overview and available training methods\naxolotl agent-docs\n\n# Topic-specific references\naxolotl agent-docs sft                 # supervised fine-tuning\naxolotl agent-docs grpo                # GRPO online RL\naxolotl agent-docs preference_tuning   # DPO, KTO, ORPO, SimPO\naxolotl agent-docs reward_modelling    # outcome and process reward models\naxolotl agent-docs pretraining         # continual pretraining\naxolotl agent-docs --list              # list all topics\n\n# Dump config schema for programmatic use\naxolotl config-schema\naxolotl config-schema --field adapter\nIf you’re working with the source repo, agent docs are also available at docs/agents/ and the project overview is in AGENTS.md.",
+    "crumbs": [
+      "Home"
+    ]
+  },
   {
     "objectID": "index.html#getting-help",
     "href": "index.html#getting-help",
@@ -3718,7 +3728,7 @@
     "href": "docs/custom_integrations.html#cut-cross-entropy",
     "title": "Custom Integrations",
     "section": "Cut Cross Entropy",
-    "text": "Cut Cross Entropy\nCut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.\nSee https://github.com/apple/ml-cross-entropy\n\nRequirements\n\nPyTorch 2.4.0 or higher\n\n\n\nInstallation\nRun the following command to install cut_cross_entropy[transformers] if you don’t have it already.\n\nIf you are in dev environment\n\npython scripts/cutcrossentropy_install.py | sh\n\nIf you are installing from pip\n\npip3 uninstall -y cut-cross-entropy && pip3 install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@63b15e6\"\n\n\nUsage\nplugins:\n  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin\n\n\nSupported Models\n\nafmoe\napertus\narcee\ncohere\ncohere2\ndeepseek_v3\nexaone4\ngemma\ngemma2\ngemma3\ngemma3_text\ngemma3n\ngemma3n_text\nglm\nglm4\nglm4_moe\nglm4_moe_lite\nglm46v\nglm4v\nglm4v_moe\nglm_image\nglm_moe_dsa\ngpt_oss\ngranite\ngranitemoe\ngranitemoehybrid\ngranitemoeshared\nhunyuan_v1_dense\nhunyuan_v1_moe\ninternvl\nkimi_linear\nlfm2\nlfm2_moe\nlfm2_vl\nllama\nllama4\nllama4_text\nllava\nministral\nministral3\nmistral\nmistral3\nmistral4\nmixtral\nmllama\nnemotron_h\nolmo\nolmo2\nolmo3\nolmoe\nphi\nphi3\nphi4_multimodal\nqwen2\nqwen2_5_vl\nqwen2_moe\nqwen2_vl\nqwen3\nqwen3_5\nqwen3_5_text\nqwen3_5_moe\nqwen3_5_moe_text\nqwen3_moe\nqwen3_next\nqwen3_vl\nqwen3_vl_moe\nseed_oss\nsmollm3\nstep3p5\nvoxtral\n\n\n\nCitation\n@article{wijmans2024cut,\n  author       = {Erik Wijmans and\n                  Brody Huval and\n                  Alexander Hertzberg and\n                  Vladlen Koltun and\n                  Philipp Kr\\\"ahenb\\\"uhl},\n  title        = {Cut Your Losses in Large-Vocabulary Language Models},\n  journal      = {arXiv},\n  year         = {2024},\n  url          = {https://arxiv.org/abs/2411.09009},\n}\nPlease see reference here",
+    "text": "Cut Cross Entropy\nCut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.\nSee https://github.com/apple/ml-cross-entropy\n\nRequirements\n\nPyTorch 2.4.0 or higher\n\n\n\nInstallation\nRun the following command to install cut_cross_entropy[transformers] if you don’t have it already.\n\nIf you are in dev environment\n\npython scripts/cutcrossentropy_install.py | sh\n\nIf you are installing from pip\n\npip3 uninstall -y cut-cross-entropy && pip3 install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\"\n\n\nUsage\nplugins:\n  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin\n\n\nSupported Models\n\nafmoe\napertus\narcee\ncohere\ncohere2\ndeepseek_v3\nexaone4\ngemma\ngemma2\ngemma3\ngemma3_text\ngemma3n\ngemma3n_text\ngemma4\nglm\nglm4\nglm4_moe\nglm4_moe_lite\nglm46v\nglm4v\nglm4v_moe\nglm_image\nglm_moe_dsa\ngpt_oss\ngranite\ngranitemoe\ngranitemoehybrid\ngranitemoeshared\nhunyuan_v1_dense\nhunyuan_v1_moe\ninternvl\nkimi_linear\nlfm2\nlfm2_moe\nlfm2_vl\nllama\nllama4\nllama4_text\nllava\nministral\nministral3\nmistral\nmistral3\nmistral4\nmixtral\nmllama\nnemotron_h\nolmo\nolmo2\nolmo3\nolmoe\nphi\nphi3\nphi4_multimodal\nqwen2\nqwen2_5_vl\nqwen2_moe\nqwen2_vl\nqwen3\nqwen3_5\nqwen3_5_text\nqwen3_5_moe\nqwen3_5_moe_text\nqwen3_moe\nqwen3_next\nqwen3_vl\nqwen3_vl_moe\nseed_oss\nsmollm3\nstep3p5\nvoxtral\n\n\n\nCitation\n@article{wijmans2024cut,\n  author       = {Erik Wijmans and\n                  Brody Huval and\n                  Alexander Hertzberg and\n                  Vladlen Koltun and\n                  Philipp Kr\\\"ahenb\\\"uhl},\n  title        = {Cut Your Losses in Large-Vocabulary Language Models},\n  journal      = {arXiv},\n  year         = {2024},\n  url          = {https://arxiv.org/abs/2411.09009},\n}\nPlease see reference here",
     "crumbs": [
       "Advanced Features",
       "Custom Integrations"
@@ -3762,7 +3772,7 @@
     "href": "docs/custom_integrations.html#kernels-integration",
     "title": "Custom Integrations",
     "section": "Kernels Integration",
-    "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n    _global_mapping = {\n        \"batched_mm\": batched_mm_experts_forward,\n        \"grouped_mm\": grouped_mm_experts_forward,\n    }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n  - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation to batched_mm or grouped_mm is incompatible with custom kernel options. The exception is experts_implementation: scattermoe, which is used for models like Gemma 4 that embed MoE directly in the decoder layer (no SparseMoeBlock) and dispatch through the transformers ExpertsInterface.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation via the HF kernels library.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized CUTLASS kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports pluggable routing strategies (see routing table below).\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\n\nModel Support Matrix\nMost models use the SwiGLU activation (silu(gate) * up). Gemma 4 uses GEGLU (gelu(gate) * up). ScatterMoE supports any gated activation (activation is applied in Python between kernel calls). SonicMoE supports SwiGLU, GEGLU, and REGLU via its ActivationType enum.\n\n\nRouting strategies\n\n\n\n\n\n\n\n\n\nRouting Strategy\nDescription\nScatterMoE\nSonicMoE\n\n\n\n\nsoftmax → topk\nSoftmax over experts, select top-K, optional renormalization\nYes\nYes\n\n\nsoftmax → group selection → topk\nSoftmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scaling\nNo\nYes\n\n\nsigmoid → topk (with groups)\nSigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoid\nYes\nYes\n\n\nsigmoid → topk (no groups)\nSigmoid + bias correction, straight topk (n_group=1)\nYes\nYes\n\n\nsoftmax → bias correction → topk\nSoftmax, bias via gate.moe_statics, topk, gather from original probs, clamp-based renorm\nNo\nYes\n\n\nsoftmax → group_limited_greedy\nSoftmax, group selection (max per group), topk, scale only (no renorm)\nNo\nYes\n\n\nsoftmax → topk via gate.wg\nSoftmax, gate weight at gate.wg.weight (not gate.weight), always renormalize\nNo\nYes\n\n\nsoftmax → topk + per_expert_scale\nRMSNorm → scale → proj → softmax → topk → renorm → per-expert learned scales\nYes\nYes\n\n\nfused topk → softmax\nRouting + expert computation fused in a single kernel\nNo\nPlanned\n\n\n\n\n\nPer-model support\n\n\n\n\n\n\n\n\n\n\nModel Type\nArchitecture\nRouting\nScatterMoE\nSonicMoE\n\n\n\n\nqwen2_moe\nQwen2-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_moe\nQwen3-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe\nQwen3.5-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe_text\nQwen3.5-MoE (VLM text)\nsoftmax → topk\nYes\nYes\n\n\nqwen3_next\nQwen3-Next\nsoftmax → topk\nYes\nYes\n\n\nqwen3_vl_moe\nQwen3-VL-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_omni_moe\nQwen3-Omni (Thinker + Talker)\nsoftmax → topk\nYes\nYes\n\n\nolmoe\nOLMoE\nsoftmax → topk\nYes\nYes\n\n\nmixtral\nMixtral\nsoftmax → topk\nYes\nYes\n\n\nminimax\nMiniMax\nsoftmax → topk\nYes\nYes\n\n\nmistral4\nMistral 4\nsoftmax → group → topk\nNo\nYes\n\n\nglm_moe_dsa\nGLM-MoE DSA (GLM 5)\nsigmoid → topk (groups)\nYes\nYes\n\n\ndeepseek_v3\nDeepSeek-V3\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe\nGLM4-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe_lite\nGLM4-MoE Lite (GLM 4.7 Flash)\nsigmoid → topk (groups)\nYes*\nYes\n\n\nglm4v_moe\nGLM4v-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nminimax_m2\nMiniMax M2\nsigmoid → topk (no groups)\nYes\nYes\n\n\nernie4_5_moe\nERNIE 4.5 MoE\nsoftmax → bias → topk\nNo\nYes\n\n\ndeepseek_v2\nDeepSeek-V2\nsoftmax → group_limited_greedy\nNo\nYes\n\n\nhunyuan_v1_moe\nHunYuan V1 MoE\nsoftmax → topk (gate.wg)\nNo\nYes\n\n\ngemma4_text\nGemma 4 (26B-A4B)\nsoftmax → topk + per_expert_scale\nYes**\nYes**\n\n\ngpt_oss\nGPT-OSS\nfused topk → softmax\nNo\nPlanned\n\n\n\n* glm4_moe_lite with ScatterMoE may have issues — see Limitations.\n** Gemma 4 uses experts_implementation: scattermoe path (registered via ExpertsInterface) instead of SparseMoeBlock patching, since Gemma 4 embeds MoE directly in its decoder layer (no separate SparseMoeBlock). See the Gemma 4 section below.\n\n\nFeature comparison\n\n\n\n\n\n\n\n\nFeature\nScatterMoE\nSonicMoE\n\n\n\n\nKernel backend\nTriton\nCUTLASS\n\n\nGPU requirement\nAny CUDA\nHopper (H100/H200) or Blackwell (B200+)\n\n\nLoRA approach\nFused in Triton kernel\nRuntime materialization + custom autograd\n\n\nLoRA overhead\nLower (fused computation)\nHigher (per-forward materialization)\n\n\nGate/router LoRA\nYes\nYes\n\n\nExpert LoRA\nYes (fused)\nYes (materialized)\n\n\nShared expert LoRA\nYes (standard PEFT)\nYes (standard PEFT)\n\n\nSelective expert dequantization\nYes (~97% memory savings)\nNo\n\n\nWeight format\nTransposed [E, hidden, 2*inter]\nInterleaved gate/up [2*I, H, E]\n\n\ntorch.compile routing\nNo\nYes (optional)\n\n\n\n\n\nShared Expert Handling\nBoth kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:\n\nshared_expert (Qwen2-MoE)\nshared_experts (GLM-MoE, DeepSeek-V3)\nshared_mlp (HunYuan V1 MoE)\n\nIf shared_expert_gate exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.\n\n\nGemma 4\nGemma 4 (e.g. google/gemma-4-26B-A4B) has a unique hybrid MoE architecture:\n\nNo SparseMoeBlock: MoE is embedded directly in the decoder layer alongside a dense MLP. Both run in parallel and their outputs are summed.\nCustom router (Gemma4TextRouter): RMSNorm → learned scale → linear projection → softmax → top-k → renormalization → per-expert learned scales.\nGEGLU activation: Uses gelu_pytorch_tanh (not SiLU/SwiGLU like most other MoE models).\n128 experts, top-k=8 for the 26B-A4B variant.\n\nBecause there is no SparseMoeBlock class to patch, Gemma 4 uses a different integration path: we register \"scattermoe\" as a custom implementation in the transformers ExpertsInterface, and set experts_implementation: scattermoe in the config. The @use_experts_implementation decorator on Gemma4TextExperts then dispatches to our ScatterMoE kernel automatically. The router is untouched — it runs as-is.\nImportant limitations:\n- Flash Attention 2 is not supported — Gemma 4 uses global_head_dim: 512 for full attention layers, which exceeds FA2’s maximum head dimension of 256. Use sdp_attention: true instead.\n- Multimodal model: Gemma 4 includes vision and audio encoders. For text-only SFT, use lora_target_linear_modules with a regex to restrict LoRA to the text backbone (e.g. language_model\\.model\\.layers\\.\\d+\\.self_attn\\.(q|k|v|o)_proj).\n\n\nLimitations\n\nScatterMoE + GLM4-MoE Lite: ScatterMoE does not work reliably for GLM 4.7 Flash (glm4_moe_lite).\nNon-SwiGLU activations: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).\nGPT-OSS: Deferred — requires transposed weight layout [E, H, 2*I], expert biases, and custom GLU activation. A dedicated forward path is needed.\nFSDP + fused gate LoRA (SonicMoE): The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.\n\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here",
+    "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n    _global_mapping = {\n        \"batched_mm\": batched_mm_experts_forward,\n        \"grouped_mm\": grouped_mm_experts_forward,\n    }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n  - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation to batched_mm or grouped_mm is incompatible with custom kernel options. The exception is experts_implementation: scattermoe, which is used for models like Gemma 4 that embed MoE directly in the decoder layer (no SparseMoeBlock) and dispatch through the transformers ExpertsInterface.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation via the HF kernels library.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized CUTLASS kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports pluggable routing strategies (see routing table below).\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\n\nModel Support Matrix\nMost models use the SwiGLU activation (silu(gate) * up). Gemma 4 uses GEGLU (gelu(gate) * up). ScatterMoE supports any gated activation (activation is applied in Python between kernel calls). SonicMoE supports SwiGLU, GEGLU, and REGLU via its ActivationType enum.\n\n\nRouting strategies\n\n\n\n\n\n\n\n\n\nRouting Strategy\nDescription\nScatterMoE\nSonicMoE\n\n\n\n\nsoftmax → topk\nSoftmax over experts, select top-K, optional renormalization\nYes\nYes\n\n\nsoftmax → group selection → topk\nSoftmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scaling\nNo\nYes\n\n\nsigmoid → topk (with groups)\nSigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoid\nYes\nYes\n\n\nsigmoid → topk (no groups)\nSigmoid + bias correction, straight topk (n_group=1)\nYes\nYes\n\n\nsoftmax → bias correction → topk\nSoftmax, bias via gate.moe_statics, topk, gather from original probs, clamp-based renorm\nNo\nYes\n\n\nsoftmax → group_limited_greedy\nSoftmax, group selection (max per group), topk, scale only (no renorm)\nNo\nYes\n\n\nsoftmax → topk via gate.wg\nSoftmax, gate weight at gate.wg.weight (not gate.weight), always renormalize\nNo\nYes\n\n\nsoftmax → topk + per_expert_scale\nRMSNorm → scale → proj → softmax → topk → renorm → per-expert learned scales\nYes\nYes\n\n\nfused topk → softmax\nRouting + expert computation fused in a single kernel\nNo\nPlanned\n\n\n\n\n\nPer-model support\n\n\n\n\n\n\n\n\n\n\nModel Type\nArchitecture\nRouting\nScatterMoE\nSonicMoE\n\n\n\n\nqwen2_moe\nQwen2-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_moe\nQwen3-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe\nQwen3.5-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe_text\nQwen3.5-MoE (VLM text)\nsoftmax → topk\nYes\nYes\n\n\nqwen3_next\nQwen3-Next\nsoftmax → topk\nYes\nYes\n\n\nqwen3_vl_moe\nQwen3-VL-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_omni_moe\nQwen3-Omni (Thinker + Talker)\nsoftmax → topk\nYes\nYes\n\n\nolmoe\nOLMoE\nsoftmax → topk\nYes\nYes\n\n\nmixtral\nMixtral\nsoftmax → topk\nYes\nYes\n\n\nminimax\nMiniMax\nsoftmax → topk\nYes\nYes\n\n\nmistral4\nMistral 4\nsoftmax → group → topk\nNo\nYes\n\n\nglm_moe_dsa\nGLM-MoE DSA (GLM 5)\nsigmoid → topk (groups)\nYes\nYes\n\n\ndeepseek_v3\nDeepSeek-V3\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe\nGLM4-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe_lite\nGLM4-MoE Lite (GLM 4.7 Flash)\nsigmoid → topk (groups)\nYes*\nYes\n\n\nglm4v_moe\nGLM4v-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nminimax_m2\nMiniMax M2\nsigmoid → topk (no groups)\nYes\nYes\n\n\nernie4_5_moe\nERNIE 4.5 MoE\nsoftmax → bias → topk\nNo\nYes\n\n\ndeepseek_v2\nDeepSeek-V2\nsoftmax → group_limited_greedy\nNo\nYes\n\n\nhunyuan_v1_moe\nHunYuan V1 MoE\nsoftmax → topk (gate.wg)\nNo\nYes\n\n\ngemma4_text\nGemma 4 (26B-A4B)\nsoftmax → topk + per_expert_scale\nYes**\nYes**\n\n\ngpt_oss\nGPT-OSS\nfused topk → softmax\nNo\nPlanned\n\n\n\n* glm4_moe_lite with ScatterMoE may have issues — see Limitations.\n** Gemma 4 uses experts_implementation: scattermoe path (registered via ExpertsInterface) instead of SparseMoeBlock patching, since Gemma 4 embeds MoE directly in its decoder layer (no separate SparseMoeBlock). See the Gemma 4 section below.\n\n\nFeature comparison\n\n\n\n\n\n\n\n\nFeature\nScatterMoE\nSonicMoE\n\n\n\n\nKernel backend\nTriton\nCUTLASS\n\n\nGPU requirement\nAny CUDA\nHopper (H100/H200) or Blackwell (B200+)\n\n\nLoRA approach\nFused in Triton kernel\nRuntime materialization + custom autograd\n\n\nLoRA overhead\nLower (fused computation)\nHigher (per-forward materialization)\n\n\nGate/router LoRA\nYes\nYes\n\n\nExpert LoRA\nYes (fused)\nYes (materialized)\n\n\nShared expert LoRA\nYes (standard PEFT)\nYes (standard PEFT)\n\n\nSelective expert dequantization\nYes (~97% memory savings)\nNo\n\n\nWeight format\nTransposed [E, hidden, 2*inter]\nInterleaved gate/up [2*I, H, E]\n\n\ntorch.compile routing\nNo\nYes (optional)\n\n\n\n\n\nShared Expert Handling\nBoth kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:\n\nshared_expert (Qwen2-MoE)\nshared_experts (GLM-MoE, DeepSeek-V3)\nshared_mlp (HunYuan V1 MoE)\n\nIf shared_expert_gate exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.\n\n\nGemma 4\nGemma 4 (e.g. google/gemma-4-26B-A4B) has a unique hybrid MoE architecture:\n\nNo SparseMoeBlock: MoE is embedded directly in the decoder layer alongside a dense MLP. Both run in parallel and their outputs are summed.\nCustom router (Gemma4TextRouter): RMSNorm → learned scale → linear projection → softmax → top-k → renormalization → per-expert learned scales.\nGEGLU activation: Uses gelu_pytorch_tanh (not SiLU/SwiGLU like most other MoE models).\n128 experts, top-k=8 for the 26B-A4B variant.\n\nBecause there is no SparseMoeBlock class to patch, Gemma 4 uses a different integration path: we register \"scattermoe\" as a custom implementation in the transformers ExpertsInterface, and set experts_implementation: scattermoe in the config. The @use_experts_implementation decorator on Gemma4TextExperts then dispatches to our ScatterMoE kernel automatically. The router is untouched — it runs as-is.\n\n\nLimitations\n\nScatterMoE + GLM4-MoE Lite: ScatterMoE does not work reliably for GLM 4.7 Flash (glm4_moe_lite).\nNon-SwiGLU activations: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).\nGPT-OSS: Deferred — requires transposed weight layout [E, H, 2*I], expert biases, and custom GLU activation. A dedicated forward path is needed.\nFSDP + fused gate LoRA (SonicMoE): The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.\n\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here",
     "crumbs": [
       "Advanced Features",
       "Custom Integrations"
@@ -6357,14 +6367,14 @@
     "href": "docs/api/cli.main.html",
     "title": "cli.main",
     "section": "",
-    "text": "cli.main\nClick CLI definitions for various axolotl commands.\n\n\n\n\n\nName\nDescription\n\n\n\n\ncli\nAxolotl CLI - Train and fine-tune large language models\n\n\nevaluate\nEvaluate a model.\n\n\nfetch\nFetch example configs or other resources.\n\n\ninference\nRun inference with a trained model.\n\n\nmerge_lora\nMerge trained LoRA adapters into a base model.\n\n\nmerge_sharded_fsdp_weights\nMerge sharded FSDP model weights.\n\n\npreprocess\nPreprocess datasets before training.\n\n\ntrain\nTrain or fine-tune a model.\n\n\n\n\n\ncli.main.cli()\nAxolotl CLI - Train and fine-tune large language models\n\n\n\ncli.main.evaluate(ctx, config, launcher, **kwargs)\nEvaluate a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU evaluation (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.fetch(directory, dest)\nFetch example configs or other resources.\nAvailable directories:\n- examples: Example configuration files\n- deepspeed_configs: DeepSpeed configuration files\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndirectory\nstr\nOne of examples, deepspeed_configs.\nrequired\n\n\ndest\nOptional[str]\nOptional destination directory.\nrequired\n\n\n\n\n\n\n\ncli.main.inference(ctx, config, launcher, gradio, **kwargs)\nRun inference with a trained model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU inference (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\ngradio\nbool\nWhether to use Gradio browser interface or command line for inference.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_lora(config, **kwargs)\nMerge trained LoRA adapters into a base model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_sharded_fsdp_weights(ctx, config, launcher, **kwargs)\nMerge sharded FSDP model weights.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for weight merging (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.preprocess(config, cloud=None, **kwargs)\nPreprocess datasets before training.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\ncloud\nOptional[str]\nPath to a cloud accelerator configuration file.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.train(\n    ctx,\n    config,\n    launcher='accelerate',\n    cloud=None,\n    sweep=None,\n    **kwargs,\n)\nTrain or fine-tune a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nLiteral['accelerate', 'torchrun', 'python']\nLauncher to use for multi-GPU training (“accelerate”, “torchrun”, or “python”).\n'accelerate'\n\n\ncloud\nstr | None\nPath to a cloud accelerator configuration file\nNone\n\n\nsweep\nstr | None\nPath to YAML config for sweeping hyperparameters.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}"
+    "text": "cli.main\nClick CLI definitions for various axolotl commands.\n\n\n\n\n\nName\nDescription\n\n\n\n\nagent_docs\nShow agent-optimized documentation.\n\n\ncli\nAxolotl CLI - Train and fine-tune large language models\n\n\nconfig_schema\nDump the full config JSON schema.\n\n\nevaluate\nEvaluate a model.\n\n\nfetch\nFetch example configs or other resources.\n\n\ninference\nRun inference with a trained model.\n\n\nmerge_lora\nMerge trained LoRA adapters into a base model.\n\n\nmerge_sharded_fsdp_weights\nMerge sharded FSDP model weights.\n\n\npreprocess\nPreprocess datasets before training.\n\n\ntrain\nTrain or fine-tune a model.\n\n\n\n\n\ncli.main.agent_docs(topic, list_topics)\nShow agent-optimized documentation.\nPrints reference docs designed for AI coding agents.\nThese docs are bundled with the package — no network access needed.\n\b\nExamples:\naxolotl agent-docs # overview (start here)\naxolotl agent-docs grpo # GRPO reference\naxolotl agent-docs sft # SFT reference\naxolotl agent-docs –list # list all topics\n\n\n\ncli.main.cli()\nAxolotl CLI - Train and fine-tune large language models\n\n\n\ncli.main.config_schema(output_format, field)\nDump the full config JSON schema.\nUseful for AI agents and tooling to discover all available config options,\ntheir types, defaults, and descriptions.\n\b\nExamples:\naxolotl config-schema # full JSON schema\naxolotl config-schema –format yaml # YAML format\naxolotl config-schema –field adapter # single field\n\n\n\ncli.main.evaluate(ctx, config, launcher, **kwargs)\nEvaluate a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU evaluation (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.fetch(directory, dest)\nFetch example configs or other resources.\nAvailable directories:\n- examples: Example configuration files\n- deepspeed_configs: DeepSpeed configuration files\n- docs: Full documentation (Quarto markdown files)\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndirectory\nstr\nOne of examples, deepspeed_configs, docs.\nrequired\n\n\ndest\nOptional[str]\nOptional destination directory.\nrequired\n\n\n\n\n\n\n\ncli.main.inference(ctx, config, launcher, gradio, **kwargs)\nRun inference with a trained model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU inference (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\ngradio\nbool\nWhether to use Gradio browser interface or command line for inference.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_lora(config, **kwargs)\nMerge trained LoRA adapters into a base model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_sharded_fsdp_weights(ctx, config, launcher, **kwargs)\nMerge sharded FSDP model weights.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for weight merging (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.preprocess(config, cloud=None, **kwargs)\nPreprocess datasets before training.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\ncloud\nOptional[str]\nPath to a cloud accelerator configuration file.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.train(\n    ctx,\n    config,\n    launcher='accelerate',\n    cloud=None,\n    sweep=None,\n    **kwargs,\n)\nTrain or fine-tune a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nLiteral['accelerate', 'torchrun', 'python']\nLauncher to use for multi-GPU training (“accelerate”, “torchrun”, or “python”).\n'accelerate'\n\n\ncloud\nstr | None\nPath to a cloud accelerator configuration file\nNone\n\n\nsweep\nstr | None\nPath to YAML config for sweeping hyperparameters.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}"
   },
   {
     "objectID": "docs/api/cli.main.html#functions",
     "href": "docs/api/cli.main.html#functions",
     "title": "cli.main",
     "section": "",
-    "text": "Name\nDescription\n\n\n\n\ncli\nAxolotl CLI - Train and fine-tune large language models\n\n\nevaluate\nEvaluate a model.\n\n\nfetch\nFetch example configs or other resources.\n\n\ninference\nRun inference with a trained model.\n\n\nmerge_lora\nMerge trained LoRA adapters into a base model.\n\n\nmerge_sharded_fsdp_weights\nMerge sharded FSDP model weights.\n\n\npreprocess\nPreprocess datasets before training.\n\n\ntrain\nTrain or fine-tune a model.\n\n\n\n\n\ncli.main.cli()\nAxolotl CLI - Train and fine-tune large language models\n\n\n\ncli.main.evaluate(ctx, config, launcher, **kwargs)\nEvaluate a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU evaluation (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.fetch(directory, dest)\nFetch example configs or other resources.\nAvailable directories:\n- examples: Example configuration files\n- deepspeed_configs: DeepSpeed configuration files\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndirectory\nstr\nOne of examples, deepspeed_configs.\nrequired\n\n\ndest\nOptional[str]\nOptional destination directory.\nrequired\n\n\n\n\n\n\n\ncli.main.inference(ctx, config, launcher, gradio, **kwargs)\nRun inference with a trained model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU inference (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\ngradio\nbool\nWhether to use Gradio browser interface or command line for inference.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_lora(config, **kwargs)\nMerge trained LoRA adapters into a base model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_sharded_fsdp_weights(ctx, config, launcher, **kwargs)\nMerge sharded FSDP model weights.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for weight merging (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.preprocess(config, cloud=None, **kwargs)\nPreprocess datasets before training.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\ncloud\nOptional[str]\nPath to a cloud accelerator configuration file.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.train(\n    ctx,\n    config,\n    launcher='accelerate',\n    cloud=None,\n    sweep=None,\n    **kwargs,\n)\nTrain or fine-tune a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nLiteral['accelerate', 'torchrun', 'python']\nLauncher to use for multi-GPU training (“accelerate”, “torchrun”, or “python”).\n'accelerate'\n\n\ncloud\nstr | None\nPath to a cloud accelerator configuration file\nNone\n\n\nsweep\nstr | None\nPath to YAML config for sweeping hyperparameters.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}"
+    "text": "Name\nDescription\n\n\n\n\nagent_docs\nShow agent-optimized documentation.\n\n\ncli\nAxolotl CLI - Train and fine-tune large language models\n\n\nconfig_schema\nDump the full config JSON schema.\n\n\nevaluate\nEvaluate a model.\n\n\nfetch\nFetch example configs or other resources.\n\n\ninference\nRun inference with a trained model.\n\n\nmerge_lora\nMerge trained LoRA adapters into a base model.\n\n\nmerge_sharded_fsdp_weights\nMerge sharded FSDP model weights.\n\n\npreprocess\nPreprocess datasets before training.\n\n\ntrain\nTrain or fine-tune a model.\n\n\n\n\n\ncli.main.agent_docs(topic, list_topics)\nShow agent-optimized documentation.\nPrints reference docs designed for AI coding agents.\nThese docs are bundled with the package — no network access needed.\n\b\nExamples:\naxolotl agent-docs # overview (start here)\naxolotl agent-docs grpo # GRPO reference\naxolotl agent-docs sft # SFT reference\naxolotl agent-docs –list # list all topics\n\n\n\ncli.main.cli()\nAxolotl CLI - Train and fine-tune large language models\n\n\n\ncli.main.config_schema(output_format, field)\nDump the full config JSON schema.\nUseful for AI agents and tooling to discover all available config options,\ntheir types, defaults, and descriptions.\n\b\nExamples:\naxolotl config-schema # full JSON schema\naxolotl config-schema –format yaml # YAML format\naxolotl config-schema –field adapter # single field\n\n\n\ncli.main.evaluate(ctx, config, launcher, **kwargs)\nEvaluate a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU evaluation (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.fetch(directory, dest)\nFetch example configs or other resources.\nAvailable directories:\n- examples: Example configuration files\n- deepspeed_configs: DeepSpeed configuration files\n- docs: Full documentation (Quarto markdown files)\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndirectory\nstr\nOne of examples, deepspeed_configs, docs.\nrequired\n\n\ndest\nOptional[str]\nOptional destination directory.\nrequired\n\n\n\n\n\n\n\ncli.main.inference(ctx, config, launcher, gradio, **kwargs)\nRun inference with a trained model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for multi-GPU inference (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\ngradio\nbool\nWhether to use Gradio browser interface or command line for inference.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_lora(config, **kwargs)\nMerge trained LoRA adapters into a base model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.merge_sharded_fsdp_weights(ctx, config, launcher, **kwargs)\nMerge sharded FSDP model weights.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nstr\nLauncher to use for weight merging (“accelerate”, “torchrun”, or “python”).\nrequired\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.preprocess(config, cloud=None, **kwargs)\nPreprocess datasets before training.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\ncloud\nOptional[str]\nPath to a cloud accelerator configuration file.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}\n\n\n\n\n\n\n\ncli.main.train(\n    ctx,\n    config,\n    launcher='accelerate',\n    cloud=None,\n    sweep=None,\n    **kwargs,\n)\nTrain or fine-tune a model.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\nclick.Context\nClick context for extra args.\nrequired\n\n\nconfig\nstr\nPath to axolotl config YAML file.\nrequired\n\n\nlauncher\nLiteral['accelerate', 'torchrun', 'python']\nLauncher to use for multi-GPU training (“accelerate”, “torchrun”, or “python”).\n'accelerate'\n\n\ncloud\nstr | None\nPath to a cloud accelerator configuration file\nNone\n\n\nsweep\nstr | None\nPath to YAML config for sweeping hyperparameters.\nNone\n\n\nkwargs\n\nAdditional keyword arguments which correspond to CLI args or axolotl config options.\n{}"
   },
   {
     "objectID": "docs/api/monkeypatch.trainer_fsdp_optim.html",
diff --git a/sitemap.xml b/sitemap.xml
index 4adca6915..d9a8c7c20 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,982 +2,982 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://docs.axolotl.ai/FAQS.html</loc>
-    <lastmod>2026-04-04T09:17:57.293Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.004Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/template_free.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/conversation.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/pretraining.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/index.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.args.html</loc>
-    <lastmod>2026-04-04T09:21:16.852Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.675Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html</loc>
-    <lastmod>2026-04-04T09:21:17.358Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.172Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.preprocess.html</loc>
-    <lastmod>2026-04-04T09:21:16.945Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.766Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.core.html</loc>
-    <lastmod>2026-04-04T09:21:18.230Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.045Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html</loc>
-    <lastmod>2026-04-04T09:21:17.393Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.207Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.enums.html</loc>
-    <lastmod>2026-04-04T09:21:17.984Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.800Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.lora.html</loc>
-    <lastmod>2026-04-04T09:21:17.711Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.530Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/common.datasets.html</loc>
-    <lastmod>2026-04-04T09:21:18.227Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.042Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.relora.html</loc>
-    <lastmod>2026-04-04T09:21:17.574Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.395Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.builders.base.html</loc>
-    <lastmod>2026-04-04T09:21:16.685Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.505Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html</loc>
-    <lastmod>2026-04-04T09:21:17.339Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.153Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html</loc>
-    <lastmod>2026-04-04T09:21:18.200Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.016Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.inference.html</loc>
-    <lastmod>2026-04-04T09:21:16.910Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.732Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html</loc>
-    <lastmod>2026-04-04T09:21:17.694Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.513Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.datasets.chat.html</loc>
-    <lastmod>2026-04-04T09:21:16.755Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.573Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.format.shared.html</loc>
-    <lastmod>2026-04-04T09:21:16.748Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.567Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/logging_config.html</loc>
-    <lastmod>2026-04-04T09:21:16.677Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.497Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html</loc>
-    <lastmod>2026-04-04T09:21:17.264Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.079Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.mamba.html</loc>
-    <lastmod>2026-04-04T09:21:18.257Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.072Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.config.html</loc>
-    <lastmod>2026-04-04T09:21:16.886Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.708Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.model.html</loc>
-    <lastmod>2026-04-04T09:21:17.128Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.943Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html</loc>
-    <lastmod>2026-04-04T09:21:17.433Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.251Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.quantize.html</loc>
-    <lastmod>2026-04-04T09:21:16.951Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.772Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html</loc>
-    <lastmod>2026-04-04T09:21:17.465Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.284Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html</loc>
-    <lastmod>2026-04-04T09:21:18.204Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.020Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html</loc>
-    <lastmod>2026-04-04T09:21:17.372Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.186Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html</loc>
-    <lastmod>2026-04-04T09:21:18.322Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.139Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html</loc>
-    <lastmod>2026-04-04T09:21:17.613Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.433Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html</loc>
-    <lastmod>2026-04-04T09:21:17.655Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.474Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.patch_manager.html</loc>
-    <lastmod>2026-04-04T09:21:17.170Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.985Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html</loc>
-    <lastmod>2026-04-04T09:21:17.718Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.536Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html</loc>
-    <lastmod>2026-04-04T09:21:17.952Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.769Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html</loc>
-    <lastmod>2026-04-04T09:21:18.327Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.144Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/convert.html</loc>
-    <lastmod>2026-04-04T09:21:16.612Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.433Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.html</loc>
-    <lastmod>2026-04-04T09:21:16.975Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.795Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.lora.html</loc>
-    <lastmod>2026-04-04T09:21:17.514Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.336Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.utils.html</loc>
-    <lastmod>2026-04-04T09:21:17.620Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.440Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/common.const.html</loc>
-    <lastmod>2026-04-04T09:21:18.208Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.023Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.freeze.html</loc>
-    <lastmod>2026-04-04T09:21:17.732Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.551Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.utils.html</loc>
-    <lastmod>2026-04-04T09:21:17.991Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.807Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html</loc>
-    <lastmod>2026-04-04T09:21:18.346Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.163Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.data.sft.html</loc>
-    <lastmod>2026-04-04T09:21:17.844Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.662Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html</loc>
-    <lastmod>2026-04-04T09:21:17.566Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.387Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html</loc>
-    <lastmod>2026-04-04T09:21:17.114Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.930Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.messages.html</loc>
-    <lastmod>2026-04-04T09:21:16.743Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.561Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mamba.html</loc>
-    <lastmod>2026-04-04T09:21:17.073Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.889Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html</loc>
-    <lastmod>2026-04-04T09:21:17.412Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.229Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.swiglu.html</loc>
-    <lastmod>2026-04-04T09:21:17.540Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.361Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html</loc>
-    <lastmod>2026-04-04T09:21:17.367Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.180Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.peft.html</loc>
-    <lastmod>2026-04-04T09:21:17.941Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.758Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.trl.html</loc>
-    <lastmod>2026-04-04T09:21:17.945Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.763Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html</loc>
-    <lastmod>2026-04-04T09:21:17.332Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.146Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.vllm_serve.html</loc>
-    <lastmod>2026-04-04T09:21:16.960Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.781Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.trainer.html</loc>
-    <lastmod>2026-04-04T09:21:17.754Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.572Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html</loc>
-    <lastmod>2026-04-04T09:21:17.221Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.035Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.training_args.html</loc>
-    <lastmod>2026-04-04T09:21:16.713Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.532Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/evaluate.html</loc>
-    <lastmod>2026-04-04T09:21:16.587Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.408Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html</loc>
-    <lastmod>2026-04-04T09:21:18.337Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.154Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.tokenizer.html</loc>
-    <lastmod>2026-04-04T09:21:17.139Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.955Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-04T09:21:17.564Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.385Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html</loc>
-    <lastmod>2026-04-04T09:21:16.973Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.793Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html</loc>
-    <lastmod>2026-04-04T09:21:17.345Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.159Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-04T09:21:17.622Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.441Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html</loc>
-    <lastmod>2026-04-04T09:21:16.747Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.565Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.quantization.html</loc>
-    <lastmod>2026-04-04T09:21:17.868Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.686Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html</loc>
-    <lastmod>2026-04-04T09:21:17.643Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.463Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html</loc>
-    <lastmod>2026-04-04T09:21:17.460Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.279Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.art.html</loc>
-    <lastmod>2026-04-04T09:21:16.856Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.679Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.processor.html</loc>
-    <lastmod>2026-04-04T09:21:17.141Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.957Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html</loc>
-    <lastmod>2026-04-04T09:21:16.935Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.756Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.quantize.html</loc>
-    <lastmod>2026-04-04T09:21:17.555Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.376Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.utils.html</loc>
-    <lastmod>2026-04-04T09:21:17.116Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.932Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html</loc>
-    <lastmod>2026-04-04T09:21:17.380Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.194Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html</loc>
-    <lastmod>2026-04-04T09:21:16.892Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.714Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/faq.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.008Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/expert_quantization.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/checkpoint_saving.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/pretraining.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/grpo.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.006Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/sft.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multi-gpu.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/nd_parallelism.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/mac.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/reward_modelling.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral3.html</loc>
-    <lastmod>2026-04-04T09:21:40.143Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.606Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/hunyuan.html</loc>
-    <lastmod>2026-04-04T09:21:40.152Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.614Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/smolvlm2.html</loc>
-    <lastmod>2026-04-04T09:21:40.151Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.613Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral3/vision.html</loc>
-    <lastmod>2026-04-04T09:21:40.144Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.607Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/voxtral.html</loc>
-    <lastmod>2026-04-04T09:21:40.147Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.609Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral.html</loc>
-    <lastmod>2026-04-04T09:21:40.146Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.608Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/granite4.html</loc>
-    <lastmod>2026-04-04T09:21:40.152Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.613Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/phi.html</loc>
-    <lastmod>2026-04-04T09:21:40.151Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.613Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/internvl3_5.html</loc>
-    <lastmod>2026-04-04T09:21:40.141Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.604Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/magistral/think.html</loc>
-    <lastmod>2026-04-04T09:21:40.145Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.608Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/mistral-small.html</loc>
-    <lastmod>2026-04-04T09:21:40.146Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.609Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/gemma3n.html</loc>
-    <lastmod>2026-04-04T09:21:40.149Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.611Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/arcee.html</loc>
-    <lastmod>2026-04-04T09:21:40.143Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.605Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/llama-2.html</loc>
-    <lastmod>2026-04-04T09:21:40.148Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.610Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/llama-4.html</loc>
-    <lastmod>2026-04-04T09:21:40.148Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.610Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/seed-oss.html</loc>
-    <lastmod>2026-04-04T09:21:40.150Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.612Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/jamba.html</loc>
-    <lastmod>2026-04-04T09:21:40.153Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.614Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/nccl.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multipack.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/debugging.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset_preprocessing.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/vllm_serving.html</loc>
-    <lastmod>2026-04-04T09:17:57.300Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.013Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/optimizers.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/ebft.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/torchao.html</loc>
-    <lastmod>2026-04-04T09:17:57.300Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.013Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/lr_groups.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/streaming.html</loc>
-    <lastmod>2026-04-04T09:17:57.300Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.013Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/amd_hpc.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/installation.html</loc>
-    <lastmod>2026-04-04T09:17:57.298Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/inference.html</loc>
-    <lastmod>2026-04-04T09:17:57.298Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/getting-started.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.008Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/telemetry.html</loc>
-    <lastmod>2026-04-04T09:17:57.300Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.013Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html</loc>
-    <lastmod>2026-04-04T09:17:57.328Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.050Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/index.html</loc>
-    <lastmod>2026-04-04T09:17:57.321Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.041Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html</loc>
-    <lastmod>2026-04-04T09:17:57.304Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.019Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html</loc>
-    <lastmod>2026-04-04T09:17:57.327Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.049Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/batch_vs_grad.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/sequence_parallelism.html</loc>
-    <lastmod>2026-04-04T09:17:57.300Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.013Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/quantize.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/docker.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/attention.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/unsloth.html</loc>
-    <lastmod>2026-04-04T09:17:57.300Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.013Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/qat.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multi-node.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/custom_integrations.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/ray-integration.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/config-reference.html</loc>
-    <lastmod>2026-04-04T09:21:39.206Z</lastmod>
+    <lastmod>2026-04-06T17:15:21.636Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/gradient_checkpointing.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.008Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/grpo.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.008Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/choosing_method.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/LiquidAI.html</loc>
-    <lastmod>2026-04-04T09:21:40.152Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.614Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/magistral.html</loc>
-    <lastmod>2026-04-04T09:21:40.145Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.608Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/devstral.html</loc>
-    <lastmod>2026-04-04T09:21:40.147Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.609Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/qwen3-next.html</loc>
-    <lastmod>2026-04-04T09:21:40.149Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.611Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/mistral.html</loc>
-    <lastmod>2026-04-04T09:21:40.148Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.610Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/plano.html</loc>
-    <lastmod>2026-04-04T09:21:40.140Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.603Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/olmo3.html</loc>
-    <lastmod>2026-04-04T09:21:40.141Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.604Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/magistral/vision.html</loc>
-    <lastmod>2026-04-04T09:21:40.145Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.608Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/mimo.html</loc>
-    <lastmod>2026-04-04T09:21:40.141Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.604Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/index.html</loc>
-    <lastmod>2026-04-04T09:21:40.153Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.615Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/trinity.html</loc>
-    <lastmod>2026-04-04T09:21:40.142Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.605Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/kimi-linear.html</loc>
-    <lastmod>2026-04-04T09:21:40.140Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.603Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/orpheus.html</loc>
-    <lastmod>2026-04-04T09:21:40.153Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.615Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/qwen3.html</loc>
-    <lastmod>2026-04-04T09:21:40.149Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.611Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral3/think.html</loc>
-    <lastmod>2026-04-04T09:21:40.144Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.606Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/apertus.html</loc>
-    <lastmod>2026-04-04T09:21:40.150Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.612Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/gpt-oss.html</loc>
-    <lastmod>2026-04-04T09:21:40.150Z</lastmod>
+    <lastmod>2026-04-06T17:15:22.612Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/mixed_precision.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/lora_optims.html</loc>
-    <lastmod>2026-04-04T09:17:57.298Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset_loading.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/input_output.html</loc>
-    <lastmod>2026-04-04T09:17:57.298Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/fsdp_qlora.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.008Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/preference_tuning.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/reward_modelling.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/optimizations.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/training_stability.html</loc>
-    <lastmod>2026-04-04T09:17:57.300Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.013Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/cli.html</loc>
-    <lastmod>2026-04-04T09:17:57.295Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html</loc>
-    <lastmod>2026-04-04T09:21:18.333Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.150Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html</loc>
-    <lastmod>2026-04-04T09:21:18.228Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.043Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html</loc>
-    <lastmod>2026-04-04T09:21:17.081Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.897Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.fetch.html</loc>
-    <lastmod>2026-04-04T09:21:16.996Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.815Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.builders.causal.html</loc>
-    <lastmod>2026-04-04T09:21:16.691Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.511Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.builders.rl.html</loc>
-    <lastmod>2026-04-04T09:21:16.697Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.516Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.bench.html</loc>
-    <lastmod>2026-04-04T09:21:17.723Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.541Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html</loc>
-    <lastmod>2026-04-04T09:21:17.434Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.253Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html</loc>
-    <lastmod>2026-04-04T09:21:17.283Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.097Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html</loc>
-    <lastmod>2026-04-04T09:21:17.281Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.096Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html</loc>
-    <lastmod>2026-04-04T09:21:18.263Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.078Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schedulers.html</loc>
-    <lastmod>2026-04-04T09:21:17.793Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.611Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.utils.html</loc>
-    <lastmod>2026-04-04T09:21:17.557Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.378Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html</loc>
-    <lastmod>2026-04-04T09:21:16.745Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.563Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.constants.html</loc>
-    <lastmod>2026-04-04T09:21:17.172Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.987Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.model.html</loc>
-    <lastmod>2026-04-04T09:21:17.896Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.714Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html</loc>
-    <lastmod>2026-04-04T09:21:18.182Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.998Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.load.html</loc>
-    <lastmod>2026-04-04T09:21:17.003Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.822Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.adapter.html</loc>
-    <lastmod>2026-04-04T09:21:17.148Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.964Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.train.html</loc>
-    <lastmod>2026-04-04T09:21:16.816Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.640Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-04T09:21:17.629Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.449Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.checks.html</loc>
-    <lastmod>2026-04-04T09:21:16.864Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.687Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html</loc>
-    <lastmod>2026-04-04T09:21:17.410Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.225Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html</loc>
-    <lastmod>2026-04-04T09:21:17.324Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.138Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.trl.html</loc>
-    <lastmod>2026-04-04T09:21:17.064Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.882Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-04T09:21:17.568Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.389Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html</loc>
-    <lastmod>2026-04-04T09:21:17.192Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.006Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html</loc>
-    <lastmod>2026-04-04T09:21:17.100Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.915Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.merge_lora.html</loc>
-    <lastmod>2026-04-04T09:21:16.921Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.742Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/datasets.html</loc>
-    <lastmod>2026-04-04T09:21:16.595Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.416Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.training.html</loc>
-    <lastmod>2026-04-04T09:21:17.905Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.723Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.distributed.html</loc>
-    <lastmod>2026-04-04T09:21:17.818Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.636Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.cloud.base.html</loc>
-    <lastmod>2026-04-04T09:21:16.964Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.785Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.geglu.html</loc>
-    <lastmod>2026-04-04T09:21:17.527Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.349Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html</loc>
-    <lastmod>2026-04-04T09:21:17.179Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.994Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/index.html</loc>
-    <lastmod>2026-04-04T09:21:16.495Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.318Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.base.html</loc>
-    <lastmod>2026-04-04T09:21:17.223Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.037Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.evaluate.html</loc>
-    <lastmod>2026-04-04T09:21:16.826Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.650Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/train.html</loc>
-    <lastmod>2026-04-04T09:21:16.574Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.395Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/common.architectures.html</loc>
-    <lastmod>2026-04-04T09:21:18.206Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.022Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html</loc>
-    <lastmod>2026-04-04T09:21:17.422Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.241Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html</loc>
-    <lastmod>2026-04-04T09:21:18.329Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.145Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.train.html</loc>
-    <lastmod>2026-04-04T09:21:17.025Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.844Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.liger.args.html</loc>
-    <lastmod>2026-04-04T09:21:18.196Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_tokenizers.html</loc>
-    <lastmod>2026-04-04T09:21:16.665Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.485Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.sweeps.html</loc>
-    <lastmod>2026-04-04T09:21:17.010Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.829Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.args.html</loc>
-    <lastmod>2026-04-04T09:21:16.989Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.809Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.chat_templates.html</loc>
-    <lastmod>2026-04-04T09:21:17.705Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.523Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.config.html</loc>
-    <lastmod>2026-04-04T09:21:17.888Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.705Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html</loc>
-    <lastmod>2026-04-04T09:21:17.308Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.122Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html</loc>
-    <lastmod>2026-04-04T09:21:17.930Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.747Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.base.html</loc>
-    <lastmod>2026-04-04T09:21:18.176Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.992Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.tokenization.html</loc>
-    <lastmod>2026-04-04T09:21:17.703Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.521Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html</loc>
-    <lastmod>2026-04-04T09:21:17.570Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.390Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html</loc>
-    <lastmod>2026-04-04T09:21:18.191Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html</loc>
-    <lastmod>2026-04-04T09:21:17.656Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.476Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.base.html</loc>
-    <lastmod>2026-04-04T09:21:17.043Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.863Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html</loc>
-    <lastmod>2026-04-04T09:21:17.972Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.789Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html</loc>
-    <lastmod>2026-04-04T09:21:17.183Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.998Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.main.html</loc>
-    <lastmod>2026-04-04T09:21:16.806Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.630Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html</loc>
-    <lastmod>2026-04-04T09:21:17.633Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.453Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html</loc>
-    <lastmod>2026-04-04T09:21:16.764Z</lastmod>
+    <lastmod>2026-04-06T17:14:57.583Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html</loc>
-    <lastmod>2026-04-04T09:21:17.298Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.112Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html</loc>
-    <lastmod>2026-04-04T09:21:18.180Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.996Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html</loc>
-    <lastmod>2026-04-04T09:21:17.641Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.461Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.data.streaming.html</loc>
-    <lastmod>2026-04-04T09:21:17.837Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.654Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.batching.html</loc>
-    <lastmod>2026-04-04T09:21:18.253Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.068Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html</loc>
-    <lastmod>2026-04-04T09:21:18.314Z</lastmod>
+    <lastmod>2026-04-06T17:14:59.131Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html</loc>
-    <lastmod>2026-04-04T09:21:17.406Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.220Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.dict.html</loc>
-    <lastmod>2026-04-04T09:21:17.825Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.643Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html</loc>
-    <lastmod>2026-04-04T09:21:17.408Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.222Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html</loc>
-    <lastmod>2026-04-04T09:21:17.835Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.652Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html</loc>
-    <lastmod>2026-04-04T09:21:17.354Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.167Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html</loc>
-    <lastmod>2026-04-04T09:21:17.661Z</lastmod>
+    <lastmod>2026-04-06T17:14:58.480Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/rlhf.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/tokenized.html</loc>
-    <lastmod>2026-04-04T09:17:57.296Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.007Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multimodal.html</loc>
-    <lastmod>2026-04-04T09:17:57.299Z</lastmod>
+    <lastmod>2026-04-06T17:11:30.011Z</lastmod>
   </url>
 </urlset>

directory	str	One of `examples`, `deepspeed_configs`.	One of `examples`, `deepspeed_configs`, `docs`.	required
Optimized LoRA implementation for output projection.
LoRA_QK	Optimized LoRA QK implementation for models where v_proj is None.
LoRA_QKV	Optimized LoRA QKV implementation with quantization support.
Applies LoRA to output projection layer.
apply_lora_qk	Applies LoRA to compute Query and Key projections for models where v_proj is None.
apply_lora_qkv	Applies LoRA to compute Query, Key, Value projections.
get_embedding_lora_parameters	Extract LoRA parameters from a PEFT Embedding module.
get_lora_parameters	Gets LoRA parameters from a projection module.
matmul_lora	Efficient fused matmul + LoRA computation.