Built site for gh-pages

2025-07-07 21:10:36 +00:00
parent b86117c718
commit d7f9f4e61f
4 changed files with 422 additions and 402 deletions
--- a/search.json
+++ b/search.json
@@ -3070,6 +3070,17 @@
      "Custom Integrations"
    ]
  },
+  {
+    "objectID": "docs/custom_integrations.html#densemixer",
+    "href": "docs/custom_integrations.html#densemixer",
+    "title": "Custom Integrations",
+    "section": "DenseMixer",
+    "text": "DenseMixer\nSee DenseMixer\nSimply add the following to your axolotl YAML config:\nplugins:\n  - axolotl.integrations.densemixer.DenseMixerPlugin\nPlease see reference here",
+    "crumbs": [
+      "Advanced Features",
+      "Custom Integrations"
+    ]
+  },
  {
    "objectID": "docs/custom_integrations.html#grokfast",
    "href": "docs/custom_integrations.html#grokfast",
@@ -3093,11 +3104,11 @@
    ]
  },
  {
-    "objectID": "docs/custom_integrations.html#liger-kernels",
-    "href": "docs/custom_integrations.html#liger-kernels",
+    "objectID": "docs/custom_integrations.html#llmcompressor",
+    "href": "docs/custom_integrations.html#llmcompressor",
    "title": "Custom Integrations",
-    "section": "Liger Kernels",
-    "text": "Liger Kernels\nLiger Kernel provides efficient Triton kernels for LLM training, offering:\n\n20% increase in multi-GPU training throughput\n60% reduction in memory usage\nCompatibility with both FSDP and DeepSpeed\n\nSee https://github.com/linkedin/Liger-Kernel\n\nUsage\nplugins:\n  - axolotl.integrations.liger.LigerPlugin\nliger_rope: true\nliger_rms_norm: true\nliger_glu_activation: true\nliger_layer_norm: true\nliger_fused_linear_cross_entropy: true\n\n\nSupported Models\n\ndeepseek_v2\ngemma\ngemma2\ngemma3\ngranite\njamba\nllama\nmistral\nmixtral\nmllama\nmllama_text_model\nolmo2\npaligemma\nphi3\nqwen2\nqwen2_5_vl\nqwen2_vl\n\n\n\nCitation\n@article{hsu2024ligerkernelefficienttriton,\n      title={Liger Kernel: Efficient Triton Kernels for LLM Training},\n      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},\n      year={2024},\n      eprint={2410.10989},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2410.10989},\n      journal={arXiv preprint arXiv:2410.10989},\n}\nPlease see reference here",
+    "section": "LLMCompressor",
+    "text": "LLMCompressor\nFine-tune sparsified models in Axolotl using Neural Magic’s LLMCompressor.\nThis integration enables fine-tuning of models sparsified using LLMCompressor within the Axolotl training framework. By combining LLMCompressor’s model compression capabilities with Axolotl’s distributed training pipelines, users can efficiently fine-tune sparse models at scale.\nIt uses Axolotl’s plugin system to hook into the fine-tuning flows while maintaining sparsity throughout training.\n\n\nRequirements\n\nAxolotl with llmcompressor extras:\npip install \"axolotl[llmcompressor]\"\nRequires llmcompressor &gt;= 0.5.1\n\nThis will install all necessary dependencies to fine-tune sparsified models using the integration.\n\n\n\nUsage\nTo enable sparse fine-tuning with this integration, include the plugin in your Axolotl config:\nplugins:\n  - axolotl.integrations.llm_compressor.LLMCompressorPlugin\n\nllmcompressor:\n  recipe:\n    finetuning_stage:\n      finetuning_modifiers:\n        ConstantPruningModifier:\n          targets: [\n            're:.*q_proj.weight',\n            're:.*k_proj.weight',\n            're:.*v_proj.weight',\n            're:.*o_proj.weight',\n            're:.*gate_proj.weight',\n            're:.*up_proj.weight',\n            're:.*down_proj.weight',\n          ]\n          start: 0\n  save_compressed: true\nThis plugin does not apply pruning or sparsification itself — it is intended for fine-tuning models that have already been sparsified.\nPre-sparsified checkpoints can be:\n- Generated using LLMCompressor\n- Downloaded from Neural Magic’s Hugging Face page\n- Any custom LLM with compatible sparsity patterns that you’ve created yourself\nTo learn more about writing and customizing LLMCompressor recipes, refer to the official documentation:\nhttps://github.com/vllm-project/llm-compressor/blob/main/README.md\n\n\nStorage Optimization with save_compressed\nSetting save_compressed: true in your configuration enables saving models in a compressed format, which:\n- Reduces disk space usage by approximately 40%\n- Maintains compatibility with vLLM for accelerated inference\n- Maintains compatibility with llmcompressor for further optimization (example: quantization)\nThis option is highly recommended when working with sparse models to maximize the benefits of model compression.\n\n\nExample Config\nSee examples/llama-3/sparse-finetuning.yaml for a complete example.\n\n\n\nInference with vLLM\nAfter fine-tuning your sparse model, you can leverage vLLM for efficient inference.\nYou can also use LLMCompressor to apply additional quantization to your fine-tuned\nsparse model before inference for even greater performance benefits.:\nfrom vllm import LLM, SamplingParams\n\nprompts = [\n    \"Hello, my name is\",\n    \"The president of the United States is\",\n    \"The capital of France is\",\n    \"The future of AI is\",\n]\nsampling_params = SamplingParams(temperature=0.8, top_p=0.95)\nllm = LLM(\"path/to/your/sparse/model\")\noutputs = llm.generate(prompts, sampling_params)\n\nfor output in outputs:\n    prompt = output.prompt\n    generated_text = output.outputs[0].text\n    print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\nFor more details on vLLM’s capabilities and advanced configuration options, see the official vLLM documentation.\n\n\nLearn More\nFor details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:\nhttps://github.com/vllm-project/llm-compressor\nPlease see reference here",
    "crumbs": [
      "Advanced Features",
      "Custom Integrations"
@@ -3115,22 +3126,22 @@
    ]
  },
  {
-    "objectID": "docs/custom_integrations.html#spectrum",
-    "href": "docs/custom_integrations.html#spectrum",
+    "objectID": "docs/custom_integrations.html#liger-kernels",
+    "href": "docs/custom_integrations.html#liger-kernels",
    "title": "Custom Integrations",
-    "section": "Spectrum",
-    "text": "Spectrum\nby Eric Hartford, Lucas Atkins, Fernando Fernandes, David Golchinfar\nThis plugin contains code to freeze the bottom fraction of modules in a model, based on the Signal-to-Noise Ratio (SNR).\nSee https://github.com/cognitivecomputations/spectrum\n\nOverview\nSpectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models.\nBy identifying the top n% of layers with the highest SNR, you can optimize training efficiency.\n\n\nUsage\nplugins:\n  - axolotl.integrations.spectrum.SpectrumPlugin\n\nspectrum_top_fraction: 0.5\nspectrum_model_name: meta-llama/Meta-Llama-3.1-8B\n\n\nCitation\n@misc{hartford2024spectrumtargetedtrainingsignal,\n      title={Spectrum: Targeted Training on Signal to Noise Ratio},\n      author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},\n      year={2024},\n      eprint={2406.06623},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2406.06623},\n}\nPlease see reference here",
+    "section": "Liger Kernels",
+    "text": "Liger Kernels\nLiger Kernel provides efficient Triton kernels for LLM training, offering:\n\n20% increase in multi-GPU training throughput\n60% reduction in memory usage\nCompatibility with both FSDP and DeepSpeed\n\nSee https://github.com/linkedin/Liger-Kernel\n\nUsage\nplugins:\n  - axolotl.integrations.liger.LigerPlugin\nliger_rope: true\nliger_rms_norm: true\nliger_glu_activation: true\nliger_layer_norm: true\nliger_fused_linear_cross_entropy: true\n\n\nSupported Models\n\ndeepseek_v2\ngemma\ngemma2\ngemma3\ngranite\njamba\nllama\nmistral\nmixtral\nmllama\nmllama_text_model\nolmo2\npaligemma\nphi3\nqwen2\nqwen2_5_vl\nqwen2_vl\n\n\n\nCitation\n@article{hsu2024ligerkernelefficienttriton,\n      title={Liger Kernel: Efficient Triton Kernels for LLM Training},\n      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},\n      year={2024},\n      eprint={2410.10989},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2410.10989},\n      journal={arXiv preprint arXiv:2410.10989},\n}\nPlease see reference here",
    "crumbs": [
      "Advanced Features",
      "Custom Integrations"
    ]
  },
  {
-    "objectID": "docs/custom_integrations.html#llmcompressor",
-    "href": "docs/custom_integrations.html#llmcompressor",
+    "objectID": "docs/custom_integrations.html#spectrum",
+    "href": "docs/custom_integrations.html#spectrum",
    "title": "Custom Integrations",
-    "section": "LLMCompressor",
-    "text": "LLMCompressor\nFine-tune sparsified models in Axolotl using Neural Magic’s LLMCompressor.\nThis integration enables fine-tuning of models sparsified using LLMCompressor within the Axolotl training framework. By combining LLMCompressor’s model compression capabilities with Axolotl’s distributed training pipelines, users can efficiently fine-tune sparse models at scale.\nIt uses Axolotl’s plugin system to hook into the fine-tuning flows while maintaining sparsity throughout training.\n\n\nRequirements\n\nAxolotl with llmcompressor extras:\npip install \"axolotl[llmcompressor]\"\nRequires llmcompressor &gt;= 0.5.1\n\nThis will install all necessary dependencies to fine-tune sparsified models using the integration.\n\n\n\nUsage\nTo enable sparse fine-tuning with this integration, include the plugin in your Axolotl config:\nplugins:\n  - axolotl.integrations.llm_compressor.LLMCompressorPlugin\n\nllmcompressor:\n  recipe:\n    finetuning_stage:\n      finetuning_modifiers:\n        ConstantPruningModifier:\n          targets: [\n            're:.*q_proj.weight',\n            're:.*k_proj.weight',\n            're:.*v_proj.weight',\n            're:.*o_proj.weight',\n            're:.*gate_proj.weight',\n            're:.*up_proj.weight',\n            're:.*down_proj.weight',\n          ]\n          start: 0\n  save_compressed: true\nThis plugin does not apply pruning or sparsification itself — it is intended for fine-tuning models that have already been sparsified.\nPre-sparsified checkpoints can be:\n- Generated using LLMCompressor\n- Downloaded from Neural Magic’s Hugging Face page\n- Any custom LLM with compatible sparsity patterns that you’ve created yourself\nTo learn more about writing and customizing LLMCompressor recipes, refer to the official documentation:\nhttps://github.com/vllm-project/llm-compressor/blob/main/README.md\n\n\nStorage Optimization with save_compressed\nSetting save_compressed: true in your configuration enables saving models in a compressed format, which:\n- Reduces disk space usage by approximately 40%\n- Maintains compatibility with vLLM for accelerated inference\n- Maintains compatibility with llmcompressor for further optimization (example: quantization)\nThis option is highly recommended when working with sparse models to maximize the benefits of model compression.\n\n\nExample Config\nSee examples/llama-3/sparse-finetuning.yaml for a complete example.\n\n\n\nInference with vLLM\nAfter fine-tuning your sparse model, you can leverage vLLM for efficient inference.\nYou can also use LLMCompressor to apply additional quantization to your fine-tuned\nsparse model before inference for even greater performance benefits.:\nfrom vllm import LLM, SamplingParams\n\nprompts = [\n    \"Hello, my name is\",\n    \"The president of the United States is\",\n    \"The capital of France is\",\n    \"The future of AI is\",\n]\nsampling_params = SamplingParams(temperature=0.8, top_p=0.95)\nllm = LLM(\"path/to/your/sparse/model\")\noutputs = llm.generate(prompts, sampling_params)\n\nfor output in outputs:\n    prompt = output.prompt\n    generated_text = output.outputs[0].text\n    print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\nFor more details on vLLM’s capabilities and advanced configuration options, see the official vLLM documentation.\n\n\nLearn More\nFor details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:\nhttps://github.com/vllm-project/llm-compressor\nPlease see reference here",
+    "section": "Spectrum",
+    "text": "Spectrum\nby Eric Hartford, Lucas Atkins, Fernando Fernandes, David Golchinfar\nThis plugin contains code to freeze the bottom fraction of modules in a model, based on the Signal-to-Noise Ratio (SNR).\nSee https://github.com/cognitivecomputations/spectrum\n\nOverview\nSpectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models.\nBy identifying the top n% of layers with the highest SNR, you can optimize training efficiency.\n\n\nUsage\nplugins:\n  - axolotl.integrations.spectrum.SpectrumPlugin\n\nspectrum_top_fraction: 0.5\nspectrum_model_name: meta-llama/Meta-Llama-3.1-8B\n\n\nCitation\n@misc{hartford2024spectrumtargetedtrainingsignal,\n      title={Spectrum: Targeted Training on Signal to Noise Ratio},\n      author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},\n      year={2024},\n      eprint={2406.06623},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2406.06623},\n}\nPlease see reference here",
    "crumbs": [
      "Advanced Features",
      "Custom Integrations"