Built site for gh-pages
This commit is contained in:
@@ -837,6 +837,7 @@ Note
|
||||
<section id="limitations" class="level2">
|
||||
<h2 class="anchored" data-anchor-id="limitations">Limitations</h2>
|
||||
<ul>
|
||||
<li><code>lora_target_linear</code> is not compatible with <code>quantize_moe_experts</code>. See <a href="#expert-lora-targeting">Expert LoRA targeting</a> instead.</li>
|
||||
<li><code>cpu_ram_efficient_loading</code> hangs / takes long time with FSDP2 + QLoRA.</li>
|
||||
<li>Total model parameter count may display incorrectly (trainable param count is correct).</li>
|
||||
<li>FSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.</li>
|
||||
|
||||
@@ -809,7 +809,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<ul>
|
||||
<li>2026/03:
|
||||
<ul>
|
||||
<li>New model support has been added in Axolotl for Qwen3.5, Qwen3.5 MoE, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm47-flash">GLM-4.7-Flash</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm46v">GLM-4.6V</a>, and <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm45">GLM-4.5-Air</a>.</li>
|
||||
<li>New model support has been added in Axolotl for <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen3.5">Qwen3.5, Qwen3.5 MoE</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm47-flash">GLM-4.7-Flash</a>, <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm46v">GLM-4.6V</a>, and <a href="https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm45">GLM-4.5-Air</a>.</li>
|
||||
<li><a href="https://docs.axolotl.ai/docs/expert_quantization.html">MoE expert quantization</a> support (via <code>quantize_moe_experts: true</code>) greatly reduces VRAM when training MoE models (FSDP2 compat).</li>
|
||||
</ul></li>
|
||||
<li>2026/02:
|
||||
|
||||
@@ -124,7 +124,7 @@
|
||||
"href": "docs/expert_quantization.html#limitations",
|
||||
"title": "MoE Expert Quantization",
|
||||
"section": "Limitations",
|
||||
"text": "Limitations\n\ncpu_ram_efficient_loading hangs / takes long time with FSDP2 + QLoRA.\nTotal model parameter count may display incorrectly (trainable param count is correct).\nFSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.\nFSDP2 may use more VRAM per GPU than single GPU training due to not all layers being properly sharded across ranks.\nModel loading takes longer due to on-demand quantization, even on consecutive runs.\nDeepSpeed has not been tested.",
|
||||
"text": "Limitations\n\nlora_target_linear is not compatible with quantize_moe_experts. See Expert LoRA targeting instead.\ncpu_ram_efficient_loading hangs / takes long time with FSDP2 + QLoRA.\nTotal model parameter count may display incorrectly (trainable param count is correct).\nFSDP LoRA (8-bit) may have a large initial VRAM spike at the first 1-2 steps, which then drops. QLoRA does not exhibit this.\nFSDP2 may use more VRAM per GPU than single GPU training due to not all layers being properly sharded across ranks.\nModel loading takes longer due to on-demand quantization, even on consecutive runs.\nDeepSpeed has not been tested.",
|
||||
"crumbs": [
|
||||
"Advanced Features",
|
||||
"MoE Expert Quantization"
|
||||
|
||||
474
sitemap.xml
474
sitemap.xml
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user