Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2025-05-01 16:27:24 +00:00
parent 7cec02149d
commit c6274d0582
4 changed files with 283 additions and 174 deletions

View File

@@ -1622,6 +1622,17 @@
"Custom Integrations"
]
},
{
"objectID": "docs/custom_integrations.html#llmcompressor",
"href": "docs/custom_integrations.html#llmcompressor",
"title": "Custom Integrations",
"section": "LLMCompressor",
"text": "LLMCompressor\nFine-tune sparsified models in Axolotl using Neural Magics LLMCompressor.\nThis integration enables fine-tuning of models sparsified using LLMCompressor within the Axolotl training framework. By combining LLMCompressors model compression capabilities with Axolotls distributed training pipelines, users can efficiently fine-tune sparse models at scale.\nIt uses Axolotls plugin system to hook into the fine-tuning flows while maintaining sparsity throughout training.\n\n\nRequirements\n\nAxolotl with llmcompressor extras:\npip install \"axolotl[llmcompressor]\"\nRequires llmcompressor >= 0.5.1\n\nThis will install all necessary dependencies to fine-tune sparsified models using the integration.\n\n\n\nUsage\nTo enable sparse fine-tuning with this integration, include the plugin in your Axolotl config:\nplugins:\n - axolotl.integrations.llm_compressor.LLMCompressorPlugin\n\nllmcompressor:\n recipe:\n finetuning_stage:\n finetuning_modifiers:\n ConstantPruningModifier:\n targets: [\n 're:.*q_proj.weight',\n 're:.*k_proj.weight',\n 're:.*v_proj.weight',\n 're:.*o_proj.weight',\n 're:.*gate_proj.weight',\n 're:.*up_proj.weight',\n 're:.*down_proj.weight',\n ]\n start: 0\n save_compressed: true\nThis plugin does not apply pruning or sparsification itself — it is intended for fine-tuning models that have already been sparsified.\nPre-sparsified checkpoints can be:\n- Generated using LLMCompressor\n- Downloaded from Neural Magics Hugging Face page\n- Any custom LLM with compatible sparsity patterns that youve created yourself\nTo learn more about writing and customizing LLMCompressor recipes, refer to the official documentation:\nhttps://github.com/vllm-project/llm-compressor/blob/main/README.md\n\n\nStorage Optimization with save_compressed\nSetting save_compressed: true in your configuration enables saving models in a compressed format, which:\n- Reduces disk space usage by approximately 40%\n- Maintains compatibility with vLLM for accelerated inference\n- Maintains compatibility with llmcompressor for further optimization (example: quantization)\nThis option is highly recommended when working with sparse models to maximize the benefits of model compression.\n\n\nExample Config\nSee examples/llama-3/sparse-finetuning.yaml for a complete example.\n\n\n\nInference with vLLM\nAfter fine-tuning your sparse model, you can leverage vLLM for efficient inference.\nYou can also use LLMCompressor to apply additional quantization to your fine-tuned\nsparse model before inference for even greater performance benefits.:\nfrom vllm import LLM, SamplingParams\n\nprompts = [\n \"Hello, my name is\",\n \"The president of the United States is\",\n \"The capital of France is\",\n \"The future of AI is\",\n]\nsampling_params = SamplingParams(temperature=0.8, top_p=0.95)\nllm = LLM(\"path/to/your/sparse/model\")\noutputs = llm.generate(prompts, sampling_params)\n\nfor output in outputs:\n prompt = output.prompt\n generated_text = output.outputs[0].text\n print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\nFor more details on vLLMs capabilities and advanced configuration options, see the official vLLM documentation.\n\n\nLearn More\nFor details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:\nhttps://github.com/vllm-project/llm-compressor\nPlease see reference here",
"crumbs": [
"Advanced Features",
"Custom Integrations"
]
},
{
"objectID": "docs/custom_integrations.html#adding-a-new-integration",
"href": "docs/custom_integrations.html#adding-a-new-integration",