Built site for gh-pages

2026-03-06 14:26:20 +00:00
parent 8f63599e42
commit 17a84c24d3
5 changed files with 241 additions and 240 deletions
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-24230d64
+b1700d42
--- a/docs/expert_quantization.html
+++ b/docs/expert_quantization.html
@@ -837,6 +837,7 @@ Note
 <section id="limitations" class="level2">
 <h2 class="anchored" data-anchor-id="limitations">Limitations</h2>
 <ul>
+<li><code>lora_target_linear</code> is not compatible with <code>quantize_moe_experts</code>. See <a href="#expert-lora-targeting">Expert LoRA targeting</a> instead.</li>
 <li><code>cpu_ram_efficient_loading</code> hangs / takes long time with FSDP2 + QLoRA.</li>
 <li>Total model parameter count may display incorrectly (trainable param count is correct).</li>
 <li>FSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.</li>
--- a/index.html
+++ b/index.html
@@ -809,7 +809,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 <ul>
 <li>2026/03:
 <ul>
-<li>New model support has been added in Axolotl for Qwen3.5, Qwen3.5 MoE, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm47-flash">GLM-4.7-Flash</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm46v">GLM-4.6V</a>, and <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm45">GLM-4.5-Air</a>.</li>
+<li>New model support has been added in Axolotl for <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen3.5">Qwen3.5, Qwen3.5 MoE</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm47-flash">GLM-4.7-Flash</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm46v">GLM-4.6V</a>, and <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm45">GLM-4.5-Air</a>.</li>
 <li><a href="https://docs.axolotl.ai/docs/expert_quantization.html">MoE expert quantization</a> support (via <code>quantize_moe_experts: true</code>) greatly reduces VRAM when training MoE models (FSDP2 compat).</li>
 </ul></li>
 <li>2026/02:
--- a/search.json
+++ b/search.json
@@ -124,7 +124,7 @@
    "href": "docs/expert_quantization.html#limitations",
    "title": "MoE Expert Quantization",
    "section": "Limitations",
-    "text": "Limitations\n\ncpu_ram_efficient_loading hangs / takes long time with FSDP2 + QLoRA.\nTotal model parameter count may display incorrectly (trainable param count is correct).\nFSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.\nFSDP2 may use more VRAM per GPU than single GPU training due to not all layers being properly sharded across ranks.\nModel loading takes longer due to on-demand quantization, even on consecutive runs.\nDeepSpeed has not been tested.",
+    "text": "Limitations\n\nlora_target_linear is not compatible with quantize_moe_experts. See Expert LoRA targeting instead.\ncpu_ram_efficient_loading hangs / takes long time with FSDP2 + QLoRA.\nTotal model parameter count may display incorrectly (trainable param count is correct).\nFSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.\nFSDP2 may use more VRAM per GPU than single GPU training due to not all layers being properly sharded across ranks.\nModel loading takes longer due to on-demand quantization, even on consecutive runs.\nDeepSpeed has not been tested.",
    "crumbs": [
      "Advanced Features",
      "MoE Expert Quantization"
--- a/sitemap.xml
+++ b/sitemap.xml