diff --git a/docs/custom_integrations.qmd b/docs/custom_integrations.qmd
index cb4aef9ca..023f09732 100644
--- a/docs/custom_integrations.qmd
+++ b/docs/custom_integrations.qmd
@@ -49,7 +49,8 @@ sections = [
     ("Knowledge Distillation (KD)", "kd"),
     ("Liger Kernels", "liger"),
     ("Language Model Evaluation Harness (LM Eval)", "lm_eval"),
-    ("Spectrum", "spectrum")
+    ("Spectrum", "spectrum"),
+    ("LLMCompressor", "llm_compressor")
 ]
 
 for section_name, folder_name in sections:
diff --git a/docs/llm_compressor.qmd b/docs/llm_compressor.qmd
deleted file mode 100644
index 60b685973..000000000
--- a/docs/llm_compressor.qmd
+++ /dev/null
@@ -1,98 +0,0 @@
----
-title: "LLMCompressor Sparse Fine-tuning"
-format:
-  html:
-    toc: true
-    toc-depth: 3
-    number-sections: true
-execute:
-  enabled: false
----
-
-# LLMCompressor Integration
-
-Fine-tune sparsified models in Axolotl using [LLMCompressor](https://github.com/vllm-project/llm-compressor).
-
-This integration enables fine-tuning of models **already sparsified** using LLMCompressor. 
-It hooks into Axolotl’s training pipeline using the plugin system and maintains sparsity throughout the fine-tuning process.
-
----
-
-## Requirements
-
-- Install Axolotl with `llmcompressor` extras:
-
-```bash
-pip install "axolotl[llmcompressor]"
-```
-
-- Requires `llmcompressor >= 0.5.1`
-
-This will install all required dependencies for sparse model fine-tuning.
-
----
-
-## Usage
-
-To enable sparse fine-tuning with this integration, configure your Axolotl YAML like so:
-
-```yaml
-plugins:
-  - axolotl.integrations.llm_compressor.LLMCompressorPlugin
-
-llmcompressor:
-  recipe:
-    finetuning_stage:
-      finetuning_modifiers:
-        ConstantPruningModifier:
-          targets: [
-            're:.*q_proj.weight',
-            're:.*k_proj.weight',
-            're:.*v_proj.weight',
-            're:.*o_proj.weight',
-            're:.*gate_proj.weight',
-            're:.*up_proj.weight',
-            're:.*down_proj.weight',
-          ]
-          start: 0
-# ... (other Axolotl training arguments)
-```
-
-::: {.callout-note}
-This plugin **does not prune or sparsify the model**. It is only meant for **fine-tuning models that are already sparsified**.
-:::
-
----
-
-## Pre-Sparsified Checkpoints
-
-You can use:
-
-- Your own LLMCompressor-sparsified model
-- Or one from [Neural Magic's Hugging Face page](https://huggingface.co/neuralmagic)
-
-Refer to the [LLMCompressor README](https://github.com/vllm-project/llm-compressor/blob/main/README.md) to learn how to sparsify models or write custom recipes.
-
----
-
-## Example Config
-
-A full working example is provided at:
-
-```bash
-examples/llama-3/sparse-finetuning.yaml
-```
-
-Run fine-tuning using:
-
-```bash
-axolotl train examples/llama-3/sparse-finetuning.yaml
-```
-
----
-
-## Learn More
-
-Explore LLMCompressor capabilities, supported modifiers, and detailed examples:
-
-👉 [LLMCompressor GitHub](https://github.com/vllm-project/llm-compressor)
\ No newline at end of file
diff --git a/src/axolotl/integrations/llm_compressor/README.md b/src/axolotl/integrations/llm_compressor/README.md
index a86a89c51..a087f37a8 100644
--- a/src/axolotl/integrations/llm_compressor/README.md
+++ b/src/axolotl/integrations/llm_compressor/README.md
@@ -45,6 +45,7 @@ llmcompressor:
             're:.*down_proj.weight',
           ]
           start: 0
+  save_compressed: true
 # ... (other training arguments)
 ```
 
@@ -52,19 +53,54 @@ This plugin **does not apply pruning or sparsification itself** — it is intend
 
 Pre-sparsified checkpoints can be:
 - Generated using [LLMCompressor](https://github.com/vllm-project/llm-compressor)
-- Or downloaded from [Neural Magic's Hugging Face page](https://huggingface.co/neuralmagic)
+- Downloaded from [Neural Magic's Hugging Face page](https://huggingface.co/neuralmagic)
+- Any custom LLM with compatible sparsity patterns that you've created yourself
 
 To learn more about writing and customizing LLMCompressor recipes, refer to the official documentation:
 [https://github.com/vllm-project/llm-compressor/blob/main/README.md](https://github.com/vllm-project/llm-compressor/blob/main/README.md)
 
+### Storage Optimization with save_compressed
+
+Setting `save_compressed: true` in your configuration enables saving models in a compressed format, which:
+- Reduces disk space usage by approximately 40%
+- Maintains compatibility with vLLM for accelerated inference
+- Maintains compatibility with llmcompressor for further optimization (example: quantization)
+
+This option is highly recommended when working with sparse models to maximize the benefits of model compression.
+
 ### Example Config
 
 See [`examples/llama-3/sparse-finetuning.yaml`](examples/llama-3/sparse-finetuning.yaml) for a complete example.
 
 ---
 
+## Inference with vLLM
+
+After fine-tuning your sparse model, you can leverage vLLM for efficient inference:
+
+```python
+from vllm import LLM, SamplingParams
+
+prompts = [
+    "Hello, my name is",
+    "The president of the United States is",
+    "The capital of France is",
+    "The future of AI is",
+]
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+llm = LLM("path/to/your/sparse/model")
+outputs = llm.generate(prompts, sampling_params)
+
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
+
+For more details on vLLM's capabilities and advanced configuration options, see the [official vLLM documentation](https://docs.vllm.ai/).
+
 ## Learn More
 
 For details on available sparsity and quantization schemes, fine-tuning recipes, and usage examples, visit the official LLMCompressor repository:
 
-👉 [https://github.com/vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor)
+[https://github.com/vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor)