Built site for gh-pages

2025-09-17 09:44:04 +00:00
parent 421eea620c
commit b2034c645e
5 changed files with 214 additions and 209 deletions
--- a/docs/qat.html
+++ b/docs/qat.html
@@ -544,10 +544,16 @@ and the QAT documentation in the <a href="https://github.com/pytorch/ao/tree/mai
 <h2 class="anchored" data-anchor-id="configuring-qat-in-axolotl">Configuring QAT in Axolotl</h2>
 <p>To enable QAT in axolotl, add the following to your configuration file:</p>
 <div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">qat</span><span class="kw">:</span></span>
-<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">activation_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for activation quantization. Valid options are "int4" and "int8"</span></span>
-<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">weight_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for weight quantization. Valid options are "int4" and "int8"</span></span>
+<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">activation_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for activation quantization. Valid options are "int4", "int8", "float8"</span></span>
+<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">weight_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for weight quantization. Valid options are "int4", "fp8", and "nvfp4".</span></span>
 <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">group_size</span><span class="kw">:</span><span class="co"> # Optional[int] = 32. The number of elements in each group for per-group fake quantization</span></span>
 <span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="at">  </span><span class="fu">fake_quant_after_n_steps</span><span class="kw">:</span><span class="co"> # Optional[int] = None. The number of steps to apply fake quantization after</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<p>We support the following quantization schemas:
+- <code>Int4WeightOnly</code> (requires the <code>fbgemm-gpu</code> extra when installing Axolotl)
+- <code>Int8DynamicActivationInt4Weight</code>
+- <code>Float8DynamicActivationFloat8Weight</code>
+- <code>Float8DynamicActivationInt4Weight</code>
+- <code>NVFP4</code></p>
 <p>Once you have finished training, you must quantize your model by using the same quantization configuration which you used to train the model with. You can use the <a href="../docs/quantize.html"><code>quantize</code></a> command to do this.</p>