From 3a9e172272bf39e0719fe659741a8a9e080321e7 Mon Sep 17 00:00:00 2001 From: Rahul Tuli Date: Thu, 24 Apr 2025 14:06:25 -0400 Subject: [PATCH] Add: line about further optimizations using llmcompressor Signed-off-by: Rahul Tuli --- src/axolotl/integrations/llm_compressor/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/axolotl/integrations/llm_compressor/README.md b/src/axolotl/integrations/llm_compressor/README.md index a087f37a8..16eff804d 100644 --- a/src/axolotl/integrations/llm_compressor/README.md +++ b/src/axolotl/integrations/llm_compressor/README.md @@ -76,7 +76,9 @@ See [`examples/llama-3/sparse-finetuning.yaml`](examples/llama-3/sparse-finetuni ## Inference with vLLM -After fine-tuning your sparse model, you can leverage vLLM for efficient inference: +After fine-tuning your sparse model, you can leverage vLLM for efficient inference. +You can also use LLMCompressor to apply additional quantization to your fine-tuned +sparse model before inference for even greater performance benefits.: ```python from vllm import LLM, SamplingParams