Built site for gh-pages

2025-09-17 09:44:04 +00:00
parent 421eea620c
commit b2034c645e
5 changed files with 214 additions and 209 deletions
--- a/search.json
+++ b/search.json
@@ -208,7 +208,7 @@
    "href": "docs/quantize.html#configuring-quantization-in-axolotl",
    "title": "Quantization with torchao",
    "section": "Configuring Quantization in Axolotl",
-    "text": "Configuring Quantization in Axolotl\nQuantization is configured using the quantization key in your configuration file.\nbase_model: # The path to the model to quantize.\nquantization:\n  weight_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for weight quantization. Valid options are uintX for X in [1, 2, 3, 4, 5, 6, 7], or int4, or int8\n  activation_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for activation quantization. Valid options are \"int4\" and \"int8\"\n  group_size: # Optional[int] = 32. The number of elements in each group for per-group fake quantization\n  quantize_embedding: # Optional[bool] = False. Whether to quantize the embedding layer.\n\noutput_dir:  # The path to the output directory.\nOnce quantization is complete, your quantized model will be saved in the {output_dir}/quantized directory.\nYou may also use the quantize command to quantize a model which has been trained with QAT - you can do this by using the existing QAT configuration file which\nyou used to train the model:\n# qat.yml\nqat:\n  activation_dtype: int8\n  weight_dtype: int8\n  group_size: 256\n  quantize_embedding: true\n\noutput_dir: # The path to the output directory used during training where the final checkpoint has been saved.\naxolotl quantize qat.yml\nThis ensures that an identical quantization configuration is used to quantize the model as was used to train it.\n\n\n\n\n\n\nNote\n\n\n\nIf you have configured pushing to hub with hub_model_id, your model hub name will have the quantization schema appended to it,\ne.g. axolotl-ai-cloud/qat-nvfp4-llama3B will become axolotl-ai-cloud/qat-nvfp4-llama3B-nvfp4w",
+    "text": "Configuring Quantization in Axolotl\nQuantization is configured using the quantization key in your configuration file.\nbase_model: # The path to the model to quantize.\nquantization:\n  activation_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for activation quantization. Valid options are \"int4\", \"int8\", \"float8\"\n  weight_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for weight quantization. Valid options are \"int4\", \"fp8\", and \"nvfp4\".\n  group_size: # Optional[int] = 32. The number of elements in each group for per-group fake quantization\n  quantize_embedding: # Optional[bool] = False. Whether to quantize the embedding layer.\n\noutput_dir:  # The path to the output directory.\nOnce quantization is complete, your quantized model will be saved in the {output_dir}/quantized directory.\nYou may also use the quantize command to quantize a model which has been trained with QAT - you can do this by using the existing QAT configuration file which\nyou used to train the model:\n# qat.yml\nqat:\n  activation_dtype: int8\n  weight_dtype: int4\n  group_size: 256\n\noutput_dir: # The path to the output directory used during training where the final checkpoint has been saved.\naxolotl quantize qat.yml\nThis ensures that an identical quantization configuration is used to quantize the model as was used to train it.\n\n\n\n\n\n\nNote\n\n\n\nIf you have configured pushing to hub with hub_model_id, your model hub name will have the quantization schema appended to it,\ne.g. axolotl-ai-cloud/qat-nvfp4-llama3B will become axolotl-ai-cloud/qat-nvfp4-llama3B-nvfp4w",
    "crumbs": [
      "How To Guides",
      "Quantization with torchao"
@@ -4038,7 +4038,7 @@
    "href": "docs/qat.html#configuring-qat-in-axolotl",
    "title": "Quantization Aware Training (QAT)",
    "section": "Configuring QAT in Axolotl",
-    "text": "Configuring QAT in Axolotl\nTo enable QAT in axolotl, add the following to your configuration file:\nqat:\n  activation_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for activation quantization. Valid options are \"int4\" and \"int8\"\n  weight_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for weight quantization. Valid options are \"int4\" and \"int8\"\n  group_size: # Optional[int] = 32. The number of elements in each group for per-group fake quantization\n  fake_quant_after_n_steps: # Optional[int] = None. The number of steps to apply fake quantization after\nOnce you have finished training, you must quantize your model by using the same quantization configuration which you used to train the model with. You can use the quantize command to do this.",
+    "text": "Configuring QAT in Axolotl\nTo enable QAT in axolotl, add the following to your configuration file:\nqat:\n  activation_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for activation quantization. Valid options are \"int4\", \"int8\", \"float8\"\n  weight_dtype: # Optional[str] = \"int8\". Fake quantization layout to use for weight quantization. Valid options are \"int4\", \"fp8\", and \"nvfp4\".\n  group_size: # Optional[int] = 32. The number of elements in each group for per-group fake quantization\n  fake_quant_after_n_steps: # Optional[int] = None. The number of steps to apply fake quantization after\nWe support the following quantization schemas:\n- Int4WeightOnly (requires the fbgemm-gpu extra when installing Axolotl)\n- Int8DynamicActivationInt4Weight\n- Float8DynamicActivationFloat8Weight\n- Float8DynamicActivationInt4Weight\n- NVFP4\nOnce you have finished training, you must quantize your model by using the same quantization configuration which you used to train the model with. You can use the quantize command to do this.",
    "crumbs": [
      "How To Guides",
      "Quantization Aware Training (QAT)"