diff --git a/.nojekyll b/.nojekyll
index c2a64ff12..f6c1f0ea6 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-17703de0
\ No newline at end of file
+d0072613
\ No newline at end of file
diff --git a/docs/custom_integrations.html b/docs/custom_integrations.html
index c7f7b587c..9fdb9fa8a 100644
--- a/docs/custom_integrations.html
+++ b/docs/custom_integrations.html
@@ -823,6 +823,11 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
   <li><a href="#how-it-works-1" id="toc-how-it-works-1" class="nav-link" data-scroll-target="#how-it-works-1">How It Works</a></li>
   <li><a href="#scattermoe" id="toc-scattermoe" class="nav-link" data-scroll-target="#scattermoe">ScatterMoE</a></li>
   <li><a href="#sonicmoe" id="toc-sonicmoe" class="nav-link" data-scroll-target="#sonicmoe">SonicMoE</a></li>
+  <li><a href="#model-support-matrix" id="toc-model-support-matrix" class="nav-link" data-scroll-target="#model-support-matrix">Model Support Matrix</a></li>
+  <li><a href="#routing-strategies" id="toc-routing-strategies" class="nav-link" data-scroll-target="#routing-strategies">Routing strategies</a></li>
+  <li><a href="#per-model-support" id="toc-per-model-support" class="nav-link" data-scroll-target="#per-model-support">Per-model support</a></li>
+  <li><a href="#feature-comparison" id="toc-feature-comparison" class="nav-link" data-scroll-target="#feature-comparison">Feature comparison</a></li>
+  <li><a href="#shared-expert-handling" id="toc-shared-expert-handling" class="nav-link" data-scroll-target="#shared-expert-handling">Shared Expert Handling</a></li>
   <li><a href="#limitations-1" id="toc-limitations-1" class="nav-link" data-scroll-target="#limitations-1">Limitations</a></li>
   <li><a href="#note-on-megablocks" id="toc-note-on-megablocks" class="nav-link" data-scroll-target="#note-on-megablocks">Note on MegaBlocks</a></li>
   </ul></li>
@@ -850,7 +855,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
   <li><a href="#liger-kernels" id="toc-liger-kernels" class="nav-link" data-scroll-target="#liger-kernels">Liger Kernels</a>
   <ul class="collapse">
   <li><a href="#usage-6" id="toc-usage-6" class="nav-link" data-scroll-target="#usage-6">Usage</a></li>
-  <li><a href="#supported-models-3" id="toc-supported-models-3" class="nav-link" data-scroll-target="#supported-models-3">Supported Models</a></li>
+  <li><a href="#supported-models-2" id="toc-supported-models-2" class="nav-link" data-scroll-target="#supported-models-2">Supported Models</a></li>
   <li><a href="#citation-3" id="toc-citation-3" class="nav-link" data-scroll-target="#citation-3">Citation</a></li>
   </ul></li>
   <li><a href="#nemo-gym-integration-for-axolotl" id="toc-nemo-gym-integration-for-axolotl" class="nav-link" data-scroll-target="#nemo-gym-integration-for-axolotl">NeMo Gym Integration for Axolotl</a>
@@ -1324,27 +1329,349 @@ The quick brown fox jumps over the loud dog</code></pre>
 <h3 class="anchored" data-anchor-id="scattermoe">ScatterMoE</h3>
 <ol type="1">
 <li>Registers the ScatterMoE kernel from the local <code>libs/scattermoe_lora</code> package (includes fused LoRA support via Triton kernels).</li>
-<li>Patches the model’s <code>SparseMoeBlock</code> forward method with the optimized ScatterMoE implementation.</li>
+<li>Patches the model’s <code>SparseMoeBlock</code> forward method with the optimized ScatterMoE implementation via the HF <code>kernels</code> library.</li>
 </ol>
 </section>
 <section id="sonicmoe" class="level3">
 <h3 class="anchored" data-anchor-id="sonicmoe">SonicMoE</h3>
 <ol type="1">
 <li>Resolves the model’s MoE block class(es) from <code>constants.py</code>.</li>
-<li>Patches the forward method with SonicMoE’s optimized kernels and registers a weight converter for the interleaved gate/up projection format.</li>
-<li>Supports both softmax-&gt;topk and sigmoid-&gt;topk routing strategies.</li>
+<li>Patches the forward method with SonicMoE’s optimized CUTLASS kernels and registers a weight converter for the interleaved gate/up projection format.</li>
+<li>Supports pluggable routing strategies (see routing table below).</li>
 </ol>
 <p>Both paths use the shared <code>resolve_moe_block_classes</code> utility in <code>constants.py</code> for model-type-to-class resolution.</p>
-<section id="supported-models-2" class="level4">
-<h4 class="anchored" data-anchor-id="supported-models-2">Supported Models</h4>
-<p>See <code>constants.py</code> for the full list of supported model types (Qwen2-MoE, Qwen3-MoE, OLMoE, Mixtral, DeepSeek-V3, GLM-MoE, MiniMax, etc.).</p>
 </section>
+<section id="model-support-matrix" class="level3">
+<h3 class="anchored" data-anchor-id="model-support-matrix">Model Support Matrix</h3>
+<p>All models use the <strong>SwiGLU</strong> activation (<code>act_fn(gate) * up</code>). Neither kernel currently supports non-SwiGLU MoE architectures.</p>
+</section>
+<section id="routing-strategies" class="level3">
+<h3 class="anchored" data-anchor-id="routing-strategies">Routing strategies</h3>
+<table class="caption-top table">
+<colgroup>
+<col style="width: 18%">
+<col style="width: 18%">
+<col style="width: 31%">
+<col style="width: 31%">
+</colgroup>
+<thead>
+<tr class="header">
+<th>Routing Strategy</th>
+<th>Description</th>
+<th style="text-align: center;">ScatterMoE</th>
+<th style="text-align: center;">SonicMoE</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>softmax → topk</td>
+<td>Softmax over experts, select top-K, optional renormalization</td>
+<td style="text-align: center;">Yes</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="even">
+<td>softmax → group selection → topk</td>
+<td>Softmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scaling</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="odd">
+<td>sigmoid → topk (with groups)</td>
+<td>Sigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoid</td>
+<td style="text-align: center;">Yes</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="even">
+<td>sigmoid → topk (no groups)</td>
+<td>Sigmoid + bias correction, straight topk (n_group=1)</td>
+<td style="text-align: center;">Yes</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="odd">
+<td>softmax → bias correction → topk</td>
+<td>Softmax, bias via <code>gate.moe_statics</code>, topk, gather from original probs, clamp-based renorm</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="even">
+<td>softmax → group_limited_greedy</td>
+<td>Softmax, group selection (max per group), topk, scale only (no renorm)</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="odd">
+<td>softmax → topk via gate.wg</td>
+<td>Softmax, gate weight at <code>gate.wg.weight</code> (not <code>gate.weight</code>), always renormalize</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="even">
+<td>fused topk → softmax</td>
+<td>Routing + expert computation fused in a single kernel</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;">Planned</td>
+</tr>
+</tbody>
+</table>
+</section>
+<section id="per-model-support" class="level3">
+<h3 class="anchored" data-anchor-id="per-model-support">Per-model support</h3>
+<table class="caption-top table">
+<colgroup>
+<col style="width: 15%">
+<col style="width: 15%">
+<col style="width: 15%">
+<col style="width: 26%">
+<col style="width: 26%">
+</colgroup>
+<thead>
+<tr class="header">
+<th>Model Type</th>
+<th>Architecture</th>
+<th>Routing</th>
+<th style="text-align: center;">ScatterMoE</th>
+<th style="text-align: center;">SonicMoE</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td><code>qwen2_moe</code></td>
+<td>Qwen2-MoE</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>qwen3_moe</code></td>
+<td>Qwen3-MoE</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>qwen3_5_moe</code></td>
+<td>Qwen3.5-MoE</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>qwen3_5_moe_text</code></td>
+<td>Qwen3.5-MoE (VLM text)</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>qwen3_next</code></td>
+<td>Qwen3-Next</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>qwen3_vl_moe</code></td>
+<td>Qwen3-VL-MoE</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>qwen3_omni_moe</code></td>
+<td>Qwen3-Omni (Thinker + Talker)</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>olmoe</code></td>
+<td>OLMoE</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>mixtral</code></td>
+<td>Mixtral</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>minimax</code></td>
+<td>MiniMax</td>
+<td>softmax → topk</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>mistral4</code></td>
+<td>Mistral 4</td>
+<td>softmax → group → topk</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>glm_moe_dsa</code></td>
+<td>GLM-MoE DSA (GLM 5)</td>
+<td>sigmoid → topk (groups)</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>deepseek_v3</code></td>
+<td>DeepSeek-V3</td>
+<td>sigmoid → topk (groups)</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>glm4_moe</code></td>
+<td>GLM4-MoE</td>
+<td>sigmoid → topk (groups)</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>glm4_moe_lite</code></td>
+<td>GLM4-MoE Lite (GLM 4.7 Flash)</td>
+<td>sigmoid → topk (groups)</td>
+<td style="text-align: center;"><strong>Yes</strong>*</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>glm4v_moe</code></td>
+<td>GLM4v-MoE</td>
+<td>sigmoid → topk (groups)</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>minimax_m2</code></td>
+<td>MiniMax M2</td>
+<td>sigmoid → topk (no groups)</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>ernie4_5_moe</code></td>
+<td>ERNIE 4.5 MoE</td>
+<td>softmax → bias → topk</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>deepseek_v2</code></td>
+<td>DeepSeek-V2</td>
+<td>softmax → group_limited_greedy</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="even">
+<td><code>hunyuan_v1_moe</code></td>
+<td>HunYuan V1 MoE</td>
+<td>softmax → topk (gate.wg)</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;"><strong>Yes</strong></td>
+</tr>
+<tr class="odd">
+<td><code>gpt_oss</code></td>
+<td>GPT-OSS</td>
+<td>fused topk → softmax</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;">Planned</td>
+</tr>
+</tbody>
+</table>
+<p>* <code>glm4_moe_lite</code> with ScatterMoE may have issues — see Limitations.</p>
+</section>
+<section id="feature-comparison" class="level3">
+<h3 class="anchored" data-anchor-id="feature-comparison">Feature comparison</h3>
+<table class="caption-top table">
+<colgroup>
+<col style="width: 23%">
+<col style="width: 38%">
+<col style="width: 38%">
+</colgroup>
+<thead>
+<tr class="header">
+<th>Feature</th>
+<th style="text-align: center;">ScatterMoE</th>
+<th style="text-align: center;">SonicMoE</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>Kernel backend</td>
+<td style="text-align: center;">Triton</td>
+<td style="text-align: center;">CUTLASS</td>
+</tr>
+<tr class="even">
+<td>GPU requirement</td>
+<td style="text-align: center;">Any CUDA</td>
+<td style="text-align: center;">Hopper (H100/H200) or Blackwell (B200+)</td>
+</tr>
+<tr class="odd">
+<td>LoRA approach</td>
+<td style="text-align: center;">Fused in Triton kernel</td>
+<td style="text-align: center;">Runtime materialization + custom autograd</td>
+</tr>
+<tr class="even">
+<td>LoRA overhead</td>
+<td style="text-align: center;">Lower (fused computation)</td>
+<td style="text-align: center;">Higher (per-forward materialization)</td>
+</tr>
+<tr class="odd">
+<td>Gate/router LoRA</td>
+<td style="text-align: center;">Yes</td>
+<td style="text-align: center;">Yes</td>
+</tr>
+<tr class="even">
+<td>Expert LoRA</td>
+<td style="text-align: center;">Yes (fused)</td>
+<td style="text-align: center;">Yes (materialized)</td>
+</tr>
+<tr class="odd">
+<td>Shared expert LoRA</td>
+<td style="text-align: center;">Yes (standard PEFT)</td>
+<td style="text-align: center;">Yes (standard PEFT)</td>
+</tr>
+<tr class="even">
+<td>Selective expert dequantization</td>
+<td style="text-align: center;">Yes (~97% memory savings)</td>
+<td style="text-align: center;">No</td>
+</tr>
+<tr class="odd">
+<td>Weight format</td>
+<td style="text-align: center;">Transposed <code>[E, hidden, 2*inter]</code></td>
+<td style="text-align: center;">Interleaved gate/up <code>[2*I, H, E]</code></td>
+</tr>
+<tr class="even">
+<td>torch.compile routing</td>
+<td style="text-align: center;">No</td>
+<td style="text-align: center;">Yes (optional)</td>
+</tr>
+</tbody>
+</table>
+</section>
+<section id="shared-expert-handling" class="level3">
+<h3 class="anchored" data-anchor-id="shared-expert-handling">Shared Expert Handling</h3>
+<p>Both kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:</p>
+<ol type="1">
+<li><code>shared_expert</code> (Qwen2-MoE)</li>
+<li><code>shared_experts</code> (GLM-MoE, DeepSeek-V3)</li>
+<li><code>shared_mlp</code> (HunYuan V1 MoE)</li>
+</ol>
+<p>If <code>shared_expert_gate</code> exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.</p>
 </section>
 <section id="limitations-1" class="level3">
 <h3 class="anchored" data-anchor-id="limitations-1">Limitations</h3>
-<p>ScatterMoE uses a softmax -&gt; topk routing, so results may be different for some model architectures as baseline (GPT-OSS, etc). Incompatible with <code>GLM_MOE_DSA</code> (GLM 5) and <code>GLM4_MOE_LITE</code> (GLM 4.7 Flash) at the moment.</p>
-<p>SonicMoE supports both softmax-&gt;topk and sigmoid-&gt;topk routing, covering a wider range of architectures.</p>
-<p>ScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm.</p>
+<ul>
+<li><strong>ScatterMoE + GLM4-MoE Lite</strong>: ScatterMoE does not work reliably for GLM 4.7 Flash (<code>glm4_moe_lite</code>).</li>
+<li><strong>Non-SwiGLU activations</strong>: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).</li>
+<li><strong>GPT-OSS</strong>: Deferred — requires transposed weight layout <code>[E, H, 2*I]</code>, expert biases, and custom GLU activation. A dedicated forward path is needed.</li>
+<li><strong>FSDP + fused gate LoRA (SonicMoE)</strong>: The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.</li>
+</ul>
 </section>
 <section id="note-on-megablocks" class="level3">
 <h3 class="anchored" data-anchor-id="note-on-megablocks">Note on MegaBlocks</h3>
@@ -1552,8 +1879,8 @@ sparse model before inference for even greater performance benefits.:</p>
 <span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a><span class="fu">liger_use_token_scaling</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </section>
-<section id="supported-models-3" class="level3">
-<h3 class="anchored" data-anchor-id="supported-models-3">Supported Models</h3>
+<section id="supported-models-2" class="level3">
+<h3 class="anchored" data-anchor-id="supported-models-2">Supported Models</h3>
 <ul>
 <li>deepseek_v2</li>
 <li>gemma</li>
diff --git a/search.json b/search.json
index 3d5d5fa91..fe2654612 100644
--- a/search.json
+++ b/search.json
@@ -3762,7 +3762,7 @@
     "href": "docs/custom_integrations.html#kernels-integration",
     "title": "Custom Integrations",
     "section": "Kernels Integration",
-    "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n    _global_mapping = {\n        \"batched_mm\": batched_mm_experts_forward,\n        \"grouped_mm\": grouped_mm_experts_forward,\n    }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n  - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation is incompatible with custom kernel options.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports both softmax-&gt;topk and sigmoid-&gt;topk routing strategies.\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\nSupported Models\nSee constants.py for the full list of supported model types (Qwen2-MoE, Qwen3-MoE, OLMoE, Mixtral, DeepSeek-V3, GLM-MoE, MiniMax, etc.).\n\n\n\nLimitations\nScatterMoE uses a softmax -&gt; topk routing, so results may be different for some model architectures as baseline (GPT-OSS, etc). Incompatible with GLM_MOE_DSA (GLM 5) and GLM4_MOE_LITE (GLM 4.7 Flash) at the moment.\nSonicMoE supports both softmax-&gt;topk and sigmoid-&gt;topk routing, covering a wider range of architectures.\nScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm.\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here",
+    "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n    _global_mapping = {\n        \"batched_mm\": batched_mm_experts_forward,\n        \"grouped_mm\": grouped_mm_experts_forward,\n    }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n  - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation is incompatible with custom kernel options.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation via the HF kernels library.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized CUTLASS kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports pluggable routing strategies (see routing table below).\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\n\nModel Support Matrix\nAll models use the SwiGLU activation (act_fn(gate) * up). Neither kernel currently supports non-SwiGLU MoE architectures.\n\n\nRouting strategies\n\n\n\n\n\n\n\n\n\nRouting Strategy\nDescription\nScatterMoE\nSonicMoE\n\n\n\n\nsoftmax → topk\nSoftmax over experts, select top-K, optional renormalization\nYes\nYes\n\n\nsoftmax → group selection → topk\nSoftmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scaling\nNo\nYes\n\n\nsigmoid → topk (with groups)\nSigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoid\nYes\nYes\n\n\nsigmoid → topk (no groups)\nSigmoid + bias correction, straight topk (n_group=1)\nYes\nYes\n\n\nsoftmax → bias correction → topk\nSoftmax, bias via gate.moe_statics, topk, gather from original probs, clamp-based renorm\nNo\nYes\n\n\nsoftmax → group_limited_greedy\nSoftmax, group selection (max per group), topk, scale only (no renorm)\nNo\nYes\n\n\nsoftmax → topk via gate.wg\nSoftmax, gate weight at gate.wg.weight (not gate.weight), always renormalize\nNo\nYes\n\n\nfused topk → softmax\nRouting + expert computation fused in a single kernel\nNo\nPlanned\n\n\n\n\n\nPer-model support\n\n\n\n\n\n\n\n\n\n\nModel Type\nArchitecture\nRouting\nScatterMoE\nSonicMoE\n\n\n\n\nqwen2_moe\nQwen2-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_moe\nQwen3-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe\nQwen3.5-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe_text\nQwen3.5-MoE (VLM text)\nsoftmax → topk\nYes\nYes\n\n\nqwen3_next\nQwen3-Next\nsoftmax → topk\nYes\nYes\n\n\nqwen3_vl_moe\nQwen3-VL-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_omni_moe\nQwen3-Omni (Thinker + Talker)\nsoftmax → topk\nYes\nYes\n\n\nolmoe\nOLMoE\nsoftmax → topk\nYes\nYes\n\n\nmixtral\nMixtral\nsoftmax → topk\nYes\nYes\n\n\nminimax\nMiniMax\nsoftmax → topk\nYes\nYes\n\n\nmistral4\nMistral 4\nsoftmax → group → topk\nNo\nYes\n\n\nglm_moe_dsa\nGLM-MoE DSA (GLM 5)\nsigmoid → topk (groups)\nYes\nYes\n\n\ndeepseek_v3\nDeepSeek-V3\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe\nGLM4-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe_lite\nGLM4-MoE Lite (GLM 4.7 Flash)\nsigmoid → topk (groups)\nYes*\nYes\n\n\nglm4v_moe\nGLM4v-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nminimax_m2\nMiniMax M2\nsigmoid → topk (no groups)\nYes\nYes\n\n\nernie4_5_moe\nERNIE 4.5 MoE\nsoftmax → bias → topk\nNo\nYes\n\n\ndeepseek_v2\nDeepSeek-V2\nsoftmax → group_limited_greedy\nNo\nYes\n\n\nhunyuan_v1_moe\nHunYuan V1 MoE\nsoftmax → topk (gate.wg)\nNo\nYes\n\n\ngpt_oss\nGPT-OSS\nfused topk → softmax\nNo\nPlanned\n\n\n\n* glm4_moe_lite with ScatterMoE may have issues — see Limitations.\n\n\nFeature comparison\n\n\n\n\n\n\n\n\nFeature\nScatterMoE\nSonicMoE\n\n\n\n\nKernel backend\nTriton\nCUTLASS\n\n\nGPU requirement\nAny CUDA\nHopper (H100/H200) or Blackwell (B200+)\n\n\nLoRA approach\nFused in Triton kernel\nRuntime materialization + custom autograd\n\n\nLoRA overhead\nLower (fused computation)\nHigher (per-forward materialization)\n\n\nGate/router LoRA\nYes\nYes\n\n\nExpert LoRA\nYes (fused)\nYes (materialized)\n\n\nShared expert LoRA\nYes (standard PEFT)\nYes (standard PEFT)\n\n\nSelective expert dequantization\nYes (~97% memory savings)\nNo\n\n\nWeight format\nTransposed [E, hidden, 2*inter]\nInterleaved gate/up [2*I, H, E]\n\n\ntorch.compile routing\nNo\nYes (optional)\n\n\n\n\n\nShared Expert Handling\nBoth kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:\n\nshared_expert (Qwen2-MoE)\nshared_experts (GLM-MoE, DeepSeek-V3)\nshared_mlp (HunYuan V1 MoE)\n\nIf shared_expert_gate exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.\n\n\nLimitations\n\nScatterMoE + GLM4-MoE Lite: ScatterMoE does not work reliably for GLM 4.7 Flash (glm4_moe_lite).\nNon-SwiGLU activations: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).\nGPT-OSS: Deferred — requires transposed weight layout [E, H, 2*I], expert biases, and custom GLU activation. A dedicated forward path is needed.\nFSDP + fused gate LoRA (SonicMoE): The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.\n\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here",
     "crumbs": [
       "Advanced Features",
       "Custom Integrations"
diff --git a/sitemap.xml b/sitemap.xml
index 9bf11c8e2..1cfb9e5c6 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,982 +2,982 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://docs.axolotl.ai/FAQS.html</loc>
-    <lastmod>2026-04-02T12:02:09.344Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.298Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/template_free.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/conversation.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/pretraining.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/index.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.args.html</loc>
-    <lastmod>2026-04-02T12:05:41.667Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.891Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html</loc>
-    <lastmod>2026-04-02T12:05:42.175Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.392Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.preprocess.html</loc>
-    <lastmod>2026-04-02T12:05:41.762Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.984Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.core.html</loc>
-    <lastmod>2026-04-02T12:05:43.042Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.255Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html</loc>
-    <lastmod>2026-04-02T12:05:42.210Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.427Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.enums.html</loc>
-    <lastmod>2026-04-02T12:05:42.795Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.009Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.lora.html</loc>
-    <lastmod>2026-04-02T12:05:42.522Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.738Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/common.datasets.html</loc>
-    <lastmod>2026-04-02T12:05:43.038Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.252Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.relora.html</loc>
-    <lastmod>2026-04-02T12:05:42.389Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.605Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.builders.base.html</loc>
-    <lastmod>2026-04-02T12:05:41.500Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.726Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html</loc>
-    <lastmod>2026-04-02T12:05:42.156Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.372Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html</loc>
-    <lastmod>2026-04-02T12:05:43.012Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.225Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.inference.html</loc>
-    <lastmod>2026-04-02T12:05:41.726Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.949Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html</loc>
-    <lastmod>2026-04-02T12:05:42.505Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.721Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.datasets.chat.html</loc>
-    <lastmod>2026-04-02T12:05:41.570Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.795Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.format.shared.html</loc>
-    <lastmod>2026-04-02T12:05:41.563Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.789Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/logging_config.html</loc>
-    <lastmod>2026-04-02T12:05:41.492Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.718Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html</loc>
-    <lastmod>2026-04-02T12:05:42.080Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.297Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.mamba.html</loc>
-    <lastmod>2026-04-02T12:05:43.069Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.282Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.config.html</loc>
-    <lastmod>2026-04-02T12:05:41.702Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.925Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.model.html</loc>
-    <lastmod>2026-04-02T12:05:41.944Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.161Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html</loc>
-    <lastmod>2026-04-02T12:05:42.249Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.466Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.quantize.html</loc>
-    <lastmod>2026-04-02T12:05:41.768Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.990Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html</loc>
-    <lastmod>2026-04-02T12:05:42.282Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.499Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html</loc>
-    <lastmod>2026-04-02T12:05:43.016Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.230Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html</loc>
-    <lastmod>2026-04-02T12:05:42.188Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.405Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html</loc>
-    <lastmod>2026-04-02T12:05:43.134Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.346Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html</loc>
-    <lastmod>2026-04-02T12:05:42.424Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.640Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html</loc>
-    <lastmod>2026-04-02T12:05:42.466Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.682Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.patch_manager.html</loc>
-    <lastmod>2026-04-02T12:05:41.986Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.203Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html</loc>
-    <lastmod>2026-04-02T12:05:42.529Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.745Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html</loc>
-    <lastmod>2026-04-02T12:05:42.764Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.978Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html</loc>
-    <lastmod>2026-04-02T12:05:43.139Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.351Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/convert.html</loc>
-    <lastmod>2026-04-02T12:05:41.427Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.653Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.html</loc>
-    <lastmod>2026-04-02T12:05:41.791Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.012Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.lora.html</loc>
-    <lastmod>2026-04-02T12:05:42.329Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.546Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.utils.html</loc>
-    <lastmod>2026-04-02T12:05:42.432Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.647Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/common.const.html</loc>
-    <lastmod>2026-04-02T12:05:43.020Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.233Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.freeze.html</loc>
-    <lastmod>2026-04-02T12:05:42.543Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.759Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.utils.html</loc>
-    <lastmod>2026-04-02T12:05:42.802Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.016Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html</loc>
-    <lastmod>2026-04-02T12:05:43.158Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.370Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.data.sft.html</loc>
-    <lastmod>2026-04-02T12:05:42.655Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.871Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html</loc>
-    <lastmod>2026-04-02T12:05:42.381Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.597Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html</loc>
-    <lastmod>2026-04-02T12:05:41.930Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.148Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.messages.html</loc>
-    <lastmod>2026-04-02T12:05:41.558Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.783Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mamba.html</loc>
-    <lastmod>2026-04-02T12:05:41.886Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.107Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html</loc>
-    <lastmod>2026-04-02T12:05:42.229Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.445Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.swiglu.html</loc>
-    <lastmod>2026-04-02T12:05:42.355Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.571Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html</loc>
-    <lastmod>2026-04-02T12:05:42.183Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.400Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.peft.html</loc>
-    <lastmod>2026-04-02T12:05:42.753Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.968Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.trl.html</loc>
-    <lastmod>2026-04-02T12:05:42.757Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.972Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html</loc>
-    <lastmod>2026-04-02T12:05:42.148Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.365Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.vllm_serve.html</loc>
-    <lastmod>2026-04-02T12:05:41.777Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.998Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.trainer.html</loc>
-    <lastmod>2026-04-02T12:05:42.565Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.781Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html</loc>
-    <lastmod>2026-04-02T12:05:42.037Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.254Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.training_args.html</loc>
-    <lastmod>2026-04-02T12:05:41.528Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.754Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/evaluate.html</loc>
-    <lastmod>2026-04-02T12:05:41.402Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.629Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html</loc>
-    <lastmod>2026-04-02T12:05:43.150Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.362Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.tokenizer.html</loc>
-    <lastmod>2026-04-02T12:05:41.955Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.172Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-02T12:05:42.379Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.595Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html</loc>
-    <lastmod>2026-04-02T12:05:41.789Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.011Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html</loc>
-    <lastmod>2026-04-02T12:05:42.161Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.378Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-02T12:05:42.433Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.649Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html</loc>
-    <lastmod>2026-04-02T12:05:41.561Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.787Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.quantization.html</loc>
-    <lastmod>2026-04-02T12:05:42.680Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.895Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html</loc>
-    <lastmod>2026-04-02T12:05:42.455Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.670Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html</loc>
-    <lastmod>2026-04-02T12:05:42.277Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.494Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.art.html</loc>
-    <lastmod>2026-04-02T12:05:41.671Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.895Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.processor.html</loc>
-    <lastmod>2026-04-02T12:05:41.957Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.174Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html</loc>
-    <lastmod>2026-04-02T12:05:41.751Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.973Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.quantize.html</loc>
-    <lastmod>2026-04-02T12:05:42.370Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.586Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.utils.html</loc>
-    <lastmod>2026-04-02T12:05:41.932Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.150Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html</loc>
-    <lastmod>2026-04-02T12:05:42.197Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.413Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html</loc>
-    <lastmod>2026-04-02T12:05:41.708Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.931Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/faq.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/expert_quantization.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/checkpoint_saving.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/pretraining.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/grpo.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/sft.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multi-gpu.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/nd_parallelism.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/mac.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/reward_modelling.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral3.html</loc>
-    <lastmod>2026-04-02T12:06:06.437Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.196Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/hunyuan.html</loc>
-    <lastmod>2026-04-02T12:06:06.445Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.207Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/smolvlm2.html</loc>
-    <lastmod>2026-04-02T12:06:06.444Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.205Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral3/vision.html</loc>
-    <lastmod>2026-04-02T12:06:06.437Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.197Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/voxtral.html</loc>
-    <lastmod>2026-04-02T12:06:06.440Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.200Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral.html</loc>
-    <lastmod>2026-04-02T12:06:06.439Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.199Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/granite4.html</loc>
-    <lastmod>2026-04-02T12:06:06.444Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.206Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/phi.html</loc>
-    <lastmod>2026-04-02T12:06:06.443Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.205Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/internvl3_5.html</loc>
-    <lastmod>2026-04-02T12:06:06.435Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.194Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/magistral/think.html</loc>
-    <lastmod>2026-04-02T12:06:06.438Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.198Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/mistral-small.html</loc>
-    <lastmod>2026-04-02T12:06:06.439Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.199Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/gemma3n.html</loc>
-    <lastmod>2026-04-02T12:06:06.442Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.203Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/arcee.html</loc>
-    <lastmod>2026-04-02T12:06:06.436Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.195Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/llama-2.html</loc>
-    <lastmod>2026-04-02T12:06:06.441Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.202Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/llama-4.html</loc>
-    <lastmod>2026-04-02T12:06:06.441Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.201Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/seed-oss.html</loc>
-    <lastmod>2026-04-02T12:06:06.443Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.205Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/jamba.html</loc>
-    <lastmod>2026-04-02T12:06:06.445Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.207Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/nccl.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multipack.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/debugging.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset_preprocessing.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/vllm_serving.html</loc>
-    <lastmod>2026-04-02T12:02:09.352Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.305Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/optimizers.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/ebft.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/torchao.html</loc>
-    <lastmod>2026-04-02T12:02:09.352Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.305Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/lr_groups.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/streaming.html</loc>
-    <lastmod>2026-04-02T12:02:09.352Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.305Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/amd_hpc.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/installation.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.303Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/inference.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.303Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/getting-started.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/telemetry.html</loc>
-    <lastmod>2026-04-02T12:02:09.352Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.305Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html</loc>
-    <lastmod>2026-04-02T12:02:09.391Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.333Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/index.html</loc>
-    <lastmod>2026-04-02T12:02:09.380Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.326Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html</loc>
-    <lastmod>2026-04-02T12:02:09.359Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.309Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html</loc>
-    <lastmod>2026-04-02T12:02:09.389Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.333Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/batch_vs_grad.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/sequence_parallelism.html</loc>
-    <lastmod>2026-04-02T12:02:09.352Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.305Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/quantize.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/docker.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/attention.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/unsloth.html</loc>
-    <lastmod>2026-04-02T12:02:09.352Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.305Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/qat.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multi-node.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/custom_integrations.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/ray-integration.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/config-reference.html</loc>
-    <lastmod>2026-04-02T12:06:05.346Z</lastmod>
+    <lastmod>2026-04-02T14:22:36.307Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/gradient_checkpointing.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/grpo.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/choosing_method.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/LiquidAI.html</loc>
-    <lastmod>2026-04-02T12:06:06.444Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.206Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/magistral.html</loc>
-    <lastmod>2026-04-02T12:06:06.438Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.198Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/devstral.html</loc>
-    <lastmod>2026-04-02T12:06:06.440Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.200Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/qwen3-next.html</loc>
-    <lastmod>2026-04-02T12:06:06.441Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.202Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/mistral.html</loc>
-    <lastmod>2026-04-02T12:06:06.440Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.201Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/plano.html</loc>
-    <lastmod>2026-04-02T12:06:06.434Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.193Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/olmo3.html</loc>
-    <lastmod>2026-04-02T12:06:06.435Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.194Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/magistral/vision.html</loc>
-    <lastmod>2026-04-02T12:06:06.439Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.199Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/mimo.html</loc>
-    <lastmod>2026-04-02T12:06:06.434Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.193Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/index.html</loc>
-    <lastmod>2026-04-02T12:06:06.446Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.208Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/trinity.html</loc>
-    <lastmod>2026-04-02T12:06:06.435Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.195Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/kimi-linear.html</loc>
-    <lastmod>2026-04-02T12:06:06.434Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.192Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/orpheus.html</loc>
-    <lastmod>2026-04-02T12:06:06.446Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.208Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/qwen3.html</loc>
-    <lastmod>2026-04-02T12:06:06.442Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.203Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/ministral3/think.html</loc>
-    <lastmod>2026-04-02T12:06:06.437Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.196Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/apertus.html</loc>
-    <lastmod>2026-04-02T12:06:06.442Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.204Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/models/gpt-oss.html</loc>
-    <lastmod>2026-04-02T12:06:06.443Z</lastmod>
+    <lastmod>2026-04-02T14:22:37.204Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/mixed_precision.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/lora_optims.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset_loading.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/input_output.html</loc>
-    <lastmod>2026-04-02T12:02:09.350Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.303Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/fsdp_qlora.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/preference_tuning.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/agents/reward_modelling.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/optimizations.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/training_stability.html</loc>
-    <lastmod>2026-04-02T12:02:09.352Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.305Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/cli.html</loc>
-    <lastmod>2026-04-02T12:02:09.347Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.300Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html</loc>
-    <lastmod>2026-04-02T12:05:43.145Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.357Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html</loc>
-    <lastmod>2026-04-02T12:05:43.040Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.253Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html</loc>
-    <lastmod>2026-04-02T12:05:41.895Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.115Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.fetch.html</loc>
-    <lastmod>2026-04-02T12:05:41.812Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.033Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.builders.causal.html</loc>
-    <lastmod>2026-04-02T12:05:41.506Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.732Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.builders.rl.html</loc>
-    <lastmod>2026-04-02T12:05:41.512Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.738Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.bench.html</loc>
-    <lastmod>2026-04-02T12:05:42.533Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.749Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html</loc>
-    <lastmod>2026-04-02T12:05:42.251Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.468Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html</loc>
-    <lastmod>2026-04-02T12:05:42.100Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.316Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html</loc>
-    <lastmod>2026-04-02T12:05:42.098Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.314Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html</loc>
-    <lastmod>2026-04-02T12:05:43.075Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.288Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schedulers.html</loc>
-    <lastmod>2026-04-02T12:05:42.604Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.819Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.utils.html</loc>
-    <lastmod>2026-04-02T12:05:42.372Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.588Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html</loc>
-    <lastmod>2026-04-02T12:05:41.560Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.785Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.constants.html</loc>
-    <lastmod>2026-04-02T12:05:41.988Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.205Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.model.html</loc>
-    <lastmod>2026-04-02T12:05:42.708Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.923Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html</loc>
-    <lastmod>2026-04-02T12:05:42.994Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.207Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.load.html</loc>
-    <lastmod>2026-04-02T12:05:41.819Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.041Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/loaders.adapter.html</loc>
-    <lastmod>2026-04-02T12:05:41.964Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.181Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.train.html</loc>
-    <lastmod>2026-04-02T12:05:41.632Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.856Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-02T12:05:42.441Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.656Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.checks.html</loc>
-    <lastmod>2026-04-02T12:05:41.680Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.903Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html</loc>
-    <lastmod>2026-04-02T12:05:42.227Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.443Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html</loc>
-    <lastmod>2026-04-02T12:05:42.141Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.357Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.trl.html</loc>
-    <lastmod>2026-04-02T12:05:41.879Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.100Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html</loc>
-    <lastmod>2026-04-02T12:05:42.383Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.598Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html</loc>
-    <lastmod>2026-04-02T12:05:42.008Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.225Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html</loc>
-    <lastmod>2026-04-02T12:05:41.915Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.133Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.merge_lora.html</loc>
-    <lastmod>2026-04-02T12:05:41.737Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.959Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/datasets.html</loc>
-    <lastmod>2026-04-02T12:05:41.409Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.637Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.training.html</loc>
-    <lastmod>2026-04-02T12:05:42.717Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.932Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.distributed.html</loc>
-    <lastmod>2026-04-02T12:05:42.629Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.845Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.cloud.base.html</loc>
-    <lastmod>2026-04-02T12:05:41.781Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.003Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/kernels.geglu.html</loc>
-    <lastmod>2026-04-02T12:05:42.342Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.559Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html</loc>
-    <lastmod>2026-04-02T12:05:41.995Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.212Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/index.html</loc>
-    <lastmod>2026-04-02T12:05:41.309Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.538Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.base.html</loc>
-    <lastmod>2026-04-02T12:05:42.039Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.256Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.evaluate.html</loc>
-    <lastmod>2026-04-02T12:05:41.642Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.866Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/train.html</loc>
-    <lastmod>2026-04-02T12:05:41.388Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.616Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/common.architectures.html</loc>
-    <lastmod>2026-04-02T12:05:43.018Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.231Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html</loc>
-    <lastmod>2026-04-02T12:05:42.239Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.456Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html</loc>
-    <lastmod>2026-04-02T12:05:43.141Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.353Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.train.html</loc>
-    <lastmod>2026-04-02T12:05:41.842Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.063Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.liger.args.html</loc>
-    <lastmod>2026-04-02T12:05:43.007Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.221Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_tokenizers.html</loc>
-    <lastmod>2026-04-02T12:05:41.480Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.706Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.sweeps.html</loc>
-    <lastmod>2026-04-02T12:05:41.827Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.048Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.utils.args.html</loc>
-    <lastmod>2026-04-02T12:05:41.806Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.027Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.chat_templates.html</loc>
-    <lastmod>2026-04-02T12:05:42.516Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.732Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.config.html</loc>
-    <lastmod>2026-04-02T12:05:42.699Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.914Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html</loc>
-    <lastmod>2026-04-02T12:05:42.125Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.341Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html</loc>
-    <lastmod>2026-04-02T12:05:42.742Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.957Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.base.html</loc>
-    <lastmod>2026-04-02T12:05:42.988Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.202Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.tokenization.html</loc>
-    <lastmod>2026-04-02T12:05:42.514Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.730Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html</loc>
-    <lastmod>2026-04-02T12:05:42.384Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.600Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html</loc>
-    <lastmod>2026-04-02T12:05:43.003Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.217Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html</loc>
-    <lastmod>2026-04-02T12:05:42.468Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.684Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.base.html</loc>
-    <lastmod>2026-04-02T12:05:41.860Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.081Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html</loc>
-    <lastmod>2026-04-02T12:05:42.784Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.998Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html</loc>
-    <lastmod>2026-04-02T12:05:41.999Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.216Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/cli.main.html</loc>
-    <lastmod>2026-04-02T12:05:41.621Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.846Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html</loc>
-    <lastmod>2026-04-02T12:05:42.445Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.661Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html</loc>
-    <lastmod>2026-04-02T12:05:41.579Z</lastmod>
+    <lastmod>2026-04-02T14:22:11.804Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html</loc>
-    <lastmod>2026-04-02T12:05:42.114Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.331Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html</loc>
-    <lastmod>2026-04-02T12:05:42.992Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.206Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html</loc>
-    <lastmod>2026-04-02T12:05:42.453Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.668Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.data.streaming.html</loc>
-    <lastmod>2026-04-02T12:05:42.648Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.863Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.collators.batching.html</loc>
-    <lastmod>2026-04-02T12:05:43.065Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.278Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html</loc>
-    <lastmod>2026-04-02T12:05:43.126Z</lastmod>
+    <lastmod>2026-04-02T14:22:13.338Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html</loc>
-    <lastmod>2026-04-02T12:05:42.223Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.440Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.dict.html</loc>
-    <lastmod>2026-04-02T12:05:42.636Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.851Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html</loc>
-    <lastmod>2026-04-02T12:05:42.225Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.442Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html</loc>
-    <lastmod>2026-04-02T12:05:42.646Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.861Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html</loc>
-    <lastmod>2026-04-02T12:05:42.170Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.387Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html</loc>
-    <lastmod>2026-04-02T12:05:42.472Z</lastmod>
+    <lastmod>2026-04-02T14:22:12.688Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/rlhf.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/dataset-formats/tokenized.html</loc>
-    <lastmod>2026-04-02T12:02:09.348Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.301Z</lastmod>
   </url>
   <url>
     <loc>https://docs.axolotl.ai/docs/multimodal.html</loc>
-    <lastmod>2026-04-02T12:02:09.351Z</lastmod>
+    <lastmod>2026-04-02T14:18:48.304Z</lastmod>
   </url>
 </urlset>

Routing Strategy	Description	ScatterMoE	SonicMoE
softmax → topk	Softmax over experts, select top-K, optional renormalization	Yes	Yes
softmax → group selection → topk	Softmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scaling	No	Yes
sigmoid → topk (with groups)	Sigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoid	Yes	Yes
sigmoid → topk (no groups)	Sigmoid + bias correction, straight topk (n_group=1)	Yes	Yes
softmax → bias correction → topk	Softmax, bias via `gate.moe_statics`, topk, gather from original probs, clamp-based renorm	No	Yes
softmax → group_limited_greedy	Softmax, group selection (max per group), topk, scale only (no renorm)	No	Yes
softmax → topk via gate.wg	Softmax, gate weight at `gate.wg.weight` (not `gate.weight`), always renormalize	No	Yes
fused topk → softmax	Routing + expert computation fused in a single kernel	No	Planned
Model Type	Architecture	Routing	ScatterMoE	SonicMoE
`qwen2_moe`	Qwen2-MoE	softmax → topk	Yes	Yes
`qwen3_moe`	Qwen3-MoE	softmax → topk	Yes	Yes
`qwen3_5_moe`	Qwen3.5-MoE	softmax → topk	Yes	Yes
`qwen3_5_moe_text`	Qwen3.5-MoE (VLM text)	softmax → topk	Yes	Yes
`qwen3_next`	Qwen3-Next	softmax → topk	Yes	Yes
`qwen3_vl_moe`	Qwen3-VL-MoE	softmax → topk	Yes	Yes
`qwen3_omni_moe`	Qwen3-Omni (Thinker + Talker)	softmax → topk	Yes	Yes
`olmoe`	OLMoE	softmax → topk	Yes	Yes
`mixtral`	Mixtral	softmax → topk	Yes	Yes
`minimax`	MiniMax	softmax → topk	Yes	Yes
`mistral4`	Mistral 4	softmax → group → topk	No	Yes
`glm_moe_dsa`	GLM-MoE DSA (GLM 5)	sigmoid → topk (groups)	Yes	Yes
`deepseek_v3`	DeepSeek-V3	sigmoid → topk (groups)	Yes	Yes
`glm4_moe`	GLM4-MoE	sigmoid → topk (groups)	Yes	Yes
`glm4_moe_lite`	GLM4-MoE Lite (GLM 4.7 Flash)	sigmoid → topk (groups)	Yes*	Yes
`glm4v_moe`	GLM4v-MoE	sigmoid → topk (groups)	Yes	Yes
`minimax_m2`	MiniMax M2	sigmoid → topk (no groups)	Yes	Yes
`ernie4_5_moe`	ERNIE 4.5 MoE	softmax → bias → topk	No	Yes
`deepseek_v2`	DeepSeek-V2	softmax → group_limited_greedy	No	Yes
`hunyuan_v1_moe`	HunYuan V1 MoE	softmax → topk (gate.wg)	No	Yes
`gpt_oss`	GPT-OSS	fused topk → softmax	No	Planned
Feature	ScatterMoE	SonicMoE
Kernel backend	Triton	CUTLASS
GPU requirement	Any CUDA	Hopper (H100/H200) or Blackwell (B200+)
LoRA approach	Fused in Triton kernel	Runtime materialization + custom autograd
LoRA overhead	Lower (fused computation)	Higher (per-forward materialization)
Gate/router LoRA	Yes	Yes
Expert LoRA	Yes (fused)	Yes (materialized)
Shared expert LoRA	Yes (standard PEFT)	Yes (standard PEFT)
Selective expert dequantization	Yes (~97% memory savings)	No
Weight format	Transposed `[E, hidden, 2*inter]`	Interleaved gate/up `[2*I, H, E]`
torch.compile routing	No	Yes (optional)