Built site for gh-pages
This commit is contained in:
11
search.json
11
search.json
@@ -1622,6 +1622,17 @@
|
||||
"Custom Integrations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"objectID": "docs/custom_integrations.html#llmcompressor",
|
||||
"href": "docs/custom_integrations.html#llmcompressor",
|
||||
"title": "Custom Integrations",
|
||||
"section": "LLMCompressor",
|
||||
"text": "LLMCompressor\nFine-tune sparsified models in Axolotl using Neural Magic’s LLMCompressor.\nThis integration enables fine-tuning of models sparsified using LLMCompressor within the Axolotl training framework. By combining LLMCompressor’s model compression capabilities with Axolotl’s distributed training pipelines, users can efficiently fine-tune sparse models at scale.\nIt uses Axolotl’s plugin system to hook into the fine-tuning flows while maintaining sparsity throughout training.\n\n\nRequirements\n\nAxolotl with llmcompressor extras:\npip install \"axolotl[llmcompressor]\"\nRequires llmcompressor >= 0.5.1\n\nThis will install all necessary dependencies to fine-tune sparsified models using the integration.\n\n\n\nUsage\nTo enable sparse fine-tuning with this integration, include the plugin in your Axolotl config:\nplugins:\n - axolotl.integrations.llm_compressor.LLMCompressorPlugin\n\nllmcompressor:\n recipe:\n finetuning_stage:\n finetuning_modifiers:\n ConstantPruningModifier:\n targets: [\n 're:.*q_proj.weight',\n 're:.*k_proj.weight',\n 're:.*v_proj.weight',\n 're:.*o_proj.weight',\n 're:.*gate_proj.weight',\n 're:.*up_proj.weight',\n 're:.*down_proj.weight',\n ]\n start: 0\n save_compressed: true\nThis plugin does not apply pruning or sparsification itself — it is intended for fine-tuning models that have already been sparsified.\nPre-sparsified checkpoints can be:\n- Generated using LLMCompressor\n- Downloaded from Neural Magic’s Hugging Face page\n- Any custom LLM with compatible sparsity patterns that you’ve created yourself\nTo learn more about writing and customizing LLMCompressor recipes, refer to the official documentation:\nhttps://github.com/vllm-project/llm-compressor/blob/main/README.md\n\n\nStorage Optimization with save_compressed\nSetting save_compressed: true in your configuration enables saving models in a compressed format, which:\n- Reduces disk space usage by approximately 40%\n- Maintains compatibility with vLLM for accelerated inference\n- Maintains compatibility with llmcompressor for further optimization (example: quantization)\nThis option is highly recommended when working with sparse models to maximize the benefits of model compression.\n\n\nExample Config\nSee examples/llama-3/sparse-finetuning.yaml for a complete example.\n\n\n\nInference with vLLM\nAfter fine-tuning your sparse model, you can leverage vLLM for efficient inference.\nYou can also use LLMCompressor to apply additional quantization to your fine-tuned\nsparse model before inference for even greater performance benefits.:\nfrom vllm import LLM, SamplingParams\n\nprompts = [\n \"Hello, my name is\",\n \"The president of the United States is\",\n \"The capital of France is\",\n \"The future of AI is\",\n]\nsampling_params = SamplingParams(temperature=0.8, top_p=0.95)\nllm = LLM(\"path/to/your/sparse/model\")\noutputs = llm.generate(prompts, sampling_params)\n\nfor output in outputs:\n prompt = output.prompt\n generated_text = output.outputs[0].text\n print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\nFor more details on vLLM’s capabilities and advanced configuration options, see the official vLLM documentation.\n\n\nLearn More\nFor details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:\nhttps://github.com/vllm-project/llm-compressor\nPlease see reference here",
|
||||
"crumbs": [
|
||||
"Advanced Features",
|
||||
"Custom Integrations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"objectID": "docs/custom_integrations.html#adding-a-new-integration",
|
||||
"href": "docs/custom_integrations.html#adding-a-new-integration",
|
||||
|
||||
Reference in New Issue
Block a user