Address Review Comments:
* deleted redundant docs/llm_compressor.qmd * incorporated feedback in integration README.md * added llmcompressor integration to docs/custom_integrations.qmd Signed-off-by: Rahul Tuli <rtuli@redhat.com>
This commit is contained in:
@@ -49,7 +49,8 @@ sections = [
|
||||
("Knowledge Distillation (KD)", "kd"),
|
||||
("Liger Kernels", "liger"),
|
||||
("Language Model Evaluation Harness (LM Eval)", "lm_eval"),
|
||||
("Spectrum", "spectrum")
|
||||
("Spectrum", "spectrum"),
|
||||
("LLMCompressor", "llm_compressor")
|
||||
]
|
||||
|
||||
for section_name, folder_name in sections:
|
||||
|
||||
@@ -1,98 +0,0 @@
|
||||
---
|
||||
title: "LLMCompressor Sparse Fine-tuning"
|
||||
format:
|
||||
html:
|
||||
toc: true
|
||||
toc-depth: 3
|
||||
number-sections: true
|
||||
execute:
|
||||
enabled: false
|
||||
---
|
||||
|
||||
# LLMCompressor Integration
|
||||
|
||||
Fine-tune sparsified models in Axolotl using [LLMCompressor](https://github.com/vllm-project/llm-compressor).
|
||||
|
||||
This integration enables fine-tuning of models **already sparsified** using LLMCompressor.
|
||||
It hooks into Axolotl’s training pipeline using the plugin system and maintains sparsity throughout the fine-tuning process.
|
||||
|
||||
---
|
||||
|
||||
## Requirements
|
||||
|
||||
- Install Axolotl with `llmcompressor` extras:
|
||||
|
||||
```bash
|
||||
pip install "axolotl[llmcompressor]"
|
||||
```
|
||||
|
||||
- Requires `llmcompressor >= 0.5.1`
|
||||
|
||||
This will install all required dependencies for sparse model fine-tuning.
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
To enable sparse fine-tuning with this integration, configure your Axolotl YAML like so:
|
||||
|
||||
```yaml
|
||||
plugins:
|
||||
- axolotl.integrations.llm_compressor.LLMCompressorPlugin
|
||||
|
||||
llmcompressor:
|
||||
recipe:
|
||||
finetuning_stage:
|
||||
finetuning_modifiers:
|
||||
ConstantPruningModifier:
|
||||
targets: [
|
||||
're:.*q_proj.weight',
|
||||
're:.*k_proj.weight',
|
||||
're:.*v_proj.weight',
|
||||
're:.*o_proj.weight',
|
||||
're:.*gate_proj.weight',
|
||||
're:.*up_proj.weight',
|
||||
're:.*down_proj.weight',
|
||||
]
|
||||
start: 0
|
||||
# ... (other Axolotl training arguments)
|
||||
```
|
||||
|
||||
::: {.callout-note}
|
||||
This plugin **does not prune or sparsify the model**. It is only meant for **fine-tuning models that are already sparsified**.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Pre-Sparsified Checkpoints
|
||||
|
||||
You can use:
|
||||
|
||||
- Your own LLMCompressor-sparsified model
|
||||
- Or one from [Neural Magic's Hugging Face page](https://huggingface.co/neuralmagic)
|
||||
|
||||
Refer to the [LLMCompressor README](https://github.com/vllm-project/llm-compressor/blob/main/README.md) to learn how to sparsify models or write custom recipes.
|
||||
|
||||
---
|
||||
|
||||
## Example Config
|
||||
|
||||
A full working example is provided at:
|
||||
|
||||
```bash
|
||||
examples/llama-3/sparse-finetuning.yaml
|
||||
```
|
||||
|
||||
Run fine-tuning using:
|
||||
|
||||
```bash
|
||||
axolotl train examples/llama-3/sparse-finetuning.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Learn More
|
||||
|
||||
Explore LLMCompressor capabilities, supported modifiers, and detailed examples:
|
||||
|
||||
👉 [LLMCompressor GitHub](https://github.com/vllm-project/llm-compressor)
|
||||
@@ -45,6 +45,7 @@ llmcompressor:
|
||||
're:.*down_proj.weight',
|
||||
]
|
||||
start: 0
|
||||
save_compressed: true
|
||||
# ... (other training arguments)
|
||||
```
|
||||
|
||||
@@ -52,19 +53,54 @@ This plugin **does not apply pruning or sparsification itself** — it is intend
|
||||
|
||||
Pre-sparsified checkpoints can be:
|
||||
- Generated using [LLMCompressor](https://github.com/vllm-project/llm-compressor)
|
||||
- Or downloaded from [Neural Magic's Hugging Face page](https://huggingface.co/neuralmagic)
|
||||
- Downloaded from [Neural Magic's Hugging Face page](https://huggingface.co/neuralmagic)
|
||||
- Any custom LLM with compatible sparsity patterns that you've created yourself
|
||||
|
||||
To learn more about writing and customizing LLMCompressor recipes, refer to the official documentation:
|
||||
[https://github.com/vllm-project/llm-compressor/blob/main/README.md](https://github.com/vllm-project/llm-compressor/blob/main/README.md)
|
||||
|
||||
### Storage Optimization with save_compressed
|
||||
|
||||
Setting `save_compressed: true` in your configuration enables saving models in a compressed format, which:
|
||||
- Reduces disk space usage by approximately 40%
|
||||
- Maintains compatibility with vLLM for accelerated inference
|
||||
- Maintains compatibility with llmcompressor for further optimization (example: quantization)
|
||||
|
||||
This option is highly recommended when working with sparse models to maximize the benefits of model compression.
|
||||
|
||||
### Example Config
|
||||
|
||||
See [`examples/llama-3/sparse-finetuning.yaml`](examples/llama-3/sparse-finetuning.yaml) for a complete example.
|
||||
|
||||
---
|
||||
|
||||
## Inference with vLLM
|
||||
|
||||
After fine-tuning your sparse model, you can leverage vLLM for efficient inference:
|
||||
|
||||
```python
|
||||
from vllm import LLM, SamplingParams
|
||||
|
||||
prompts = [
|
||||
"Hello, my name is",
|
||||
"The president of the United States is",
|
||||
"The capital of France is",
|
||||
"The future of AI is",
|
||||
]
|
||||
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
||||
llm = LLM("path/to/your/sparse/model")
|
||||
outputs = llm.generate(prompts, sampling_params)
|
||||
|
||||
for output in outputs:
|
||||
prompt = output.prompt
|
||||
generated_text = output.outputs[0].text
|
||||
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
|
||||
```
|
||||
|
||||
For more details on vLLM's capabilities and advanced configuration options, see the [official vLLM documentation](https://docs.vllm.ai/).
|
||||
|
||||
## Learn More
|
||||
|
||||
For details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:
|
||||
|
||||
👉 [https://github.com/vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor)
|
||||
[https://github.com/vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor)
|
||||
|
||||
Reference in New Issue
Block a user