Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2026-03-06 14:26:20 +00:00
parent 8f63599e42
commit 17a84c24d3
5 changed files with 241 additions and 240 deletions

View File

@@ -1 +1 @@
24230d64
b1700d42

View File

@@ -837,6 +837,7 @@ Note
<section id="limitations" class="level2">
<h2 class="anchored" data-anchor-id="limitations">Limitations</h2>
<ul>
<li><code>lora_target_linear</code> is not compatible with <code>quantize_moe_experts</code>. See <a href="#expert-lora-targeting">Expert LoRA targeting</a> instead.</li>
<li><code>cpu_ram_efficient_loading</code> hangs / takes long time with FSDP2 + QLoRA.</li>
<li>Total model parameter count may display incorrectly (trainable param count is correct).</li>
<li>FSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.</li>

View File

@@ -809,7 +809,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<ul>
<li>2026/03:
<ul>
<li>New model support has been added in Axolotl for Qwen3.5, Qwen3.5 MoE, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm47-flash">GLM-4.7-Flash</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm46v">GLM-4.6V</a>, and <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm45">GLM-4.5-Air</a>.</li>
<li>New model support has been added in Axolotl for <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen3.5">Qwen3.5, Qwen3.5 MoE</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm47-flash">GLM-4.7-Flash</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm46v">GLM-4.6V</a>, and <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm45">GLM-4.5-Air</a>.</li>
<li><a href="https://docs.axolotl.ai/docs/expert_quantization.html">MoE expert quantization</a> support (via <code>quantize_moe_experts: true</code>) greatly reduces VRAM when training MoE models (FSDP2 compat).</li>
</ul></li>
<li>2026/02:

View File

@@ -124,7 +124,7 @@
"href": "docs/expert_quantization.html#limitations",
"title": "MoE Expert Quantization",
"section": "Limitations",
"text": "Limitations\n\ncpu_ram_efficient_loading hangs / takes long time with FSDP2 + QLoRA.\nTotal model parameter count may display incorrectly (trainable param count is correct).\nFSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.\nFSDP2 may use more VRAM per GPU than single GPU training due to not all layers being properly sharded across ranks.\nModel loading takes longer due to on-demand quantization, even on consecutive runs.\nDeepSpeed has not been tested.",
"text": "Limitations\n\nlora_target_linear is not compatible with quantize_moe_experts. See Expert LoRA targeting instead.\ncpu_ram_efficient_loading hangs / takes long time with FSDP2 + QLoRA.\nTotal model parameter count may display incorrectly (trainable param count is correct).\nFSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.\nFSDP2 may use more VRAM per GPU than single GPU training due to not all layers being properly sharded across ranks.\nModel loading takes longer due to on-demand quantization, even on consecutive runs.\nDeepSpeed has not been tested.",
"crumbs": [
"Advanced Features",
"MoE Expert Quantization"

File diff suppressed because it is too large Load Diff