Built site for gh-pages
This commit is contained in:
@@ -544,10 +544,16 @@ and the QAT documentation in the <a href="https://github.com/pytorch/ao/tree/mai
|
||||
<h2 class="anchored" data-anchor-id="configuring-qat-in-axolotl">Configuring QAT in Axolotl</h2>
|
||||
<p>To enable QAT in axolotl, add the following to your configuration file:</p>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">qat</span><span class="kw">:</span></span>
|
||||
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">activation_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for activation quantization. Valid options are "int4" and "int8"</span></span>
|
||||
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">weight_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for weight quantization. Valid options are "int4" and "int8"</span></span>
|
||||
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">activation_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for activation quantization. Valid options are "int4", "int8", "float8"</span></span>
|
||||
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">weight_dtype</span><span class="kw">:</span><span class="co"> # Optional[str] = "int8". Fake quantization layout to use for weight quantization. Valid options are "int4", "fp8", and "nvfp4".</span></span>
|
||||
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">group_size</span><span class="kw">:</span><span class="co"> # Optional[int] = 32. The number of elements in each group for per-group fake quantization</span></span>
|
||||
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fake_quant_after_n_steps</span><span class="kw">:</span><span class="co"> # Optional[int] = None. The number of steps to apply fake quantization after</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<p>We support the following quantization schemas:
|
||||
- <code>Int4WeightOnly</code> (requires the <code>fbgemm-gpu</code> extra when installing Axolotl)
|
||||
- <code>Int8DynamicActivationInt4Weight</code>
|
||||
- <code>Float8DynamicActivationFloat8Weight</code>
|
||||
- <code>Float8DynamicActivationInt4Weight</code>
|
||||
- <code>NVFP4</code></p>
|
||||
<p>Once you have finished training, you must quantize your model by using the same quantization configuration which you used to train the model with. You can use the <a href="../docs/quantize.html"><code>quantize</code></a> command to do this.</p>
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user