From 3a9e172272bf39e0719fe659741a8a9e080321e7 Mon Sep 17 00:00:00 2001
From: Rahul Tuli <rtuli@redhat.com>
Date: Thu, 24 Apr 2025 14:06:25 -0400
Subject: [PATCH] Add: line about further optimizations using llmcompressor

Signed-off-by: Rahul Tuli <rtuli@redhat.com>
---
 src/axolotl/integrations/llm_compressor/README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/axolotl/integrations/llm_compressor/README.md b/src/axolotl/integrations/llm_compressor/README.md
index a087f37a8..16eff804d 100644
--- a/src/axolotl/integrations/llm_compressor/README.md
+++ b/src/axolotl/integrations/llm_compressor/README.md
@@ -76,7 +76,9 @@ See [`examples/llama-3/sparse-finetuning.yaml`](examples/llama-3/sparse-finetuni
 
 ## Inference with vLLM
 
-After fine-tuning your sparse model, you can leverage vLLM for efficient inference:
+After fine-tuning your sparse model, you can leverage vLLM for efficient inference.
+You can also use LLMCompressor to apply additional quantization to your fine-tuned
+sparse model before inference for even greater performance benefits.:
 
 ```python
 from vllm import LLM, SamplingParams