From 756a0559c1bdadb8c833d9de13e728b616972a56 Mon Sep 17 00:00:00 2001
From: NanoCode012 <nano@axolotl.ai>
Date: Fri, 11 Apr 2025 20:52:43 +0700
Subject: [PATCH] feat(doc): explain deepspeed configs (#2514) [skip ci]

* feat(doc): explain deepspeed configs

* fix: add fetch configs
---
 docs/multi-gpu.qmd | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/docs/multi-gpu.qmd b/docs/multi-gpu.qmd
index 5aec89763..55eaca6c3 100644
--- a/docs/multi-gpu.qmd
+++ b/docs/multi-gpu.qmd
@@ -36,6 +36,9 @@ deepspeed: deepspeed_configs/zero1.json
 ### Usage {#sec-deepspeed-usage}
 
 ```{.bash}
+# Fetch deepspeed configs (if not already present)
+axolotl fetch deepspeed_configs
+
 # Passing arg via config
 axolotl train config.yml
 
@@ -48,10 +51,20 @@ axolotl train config.yml --deepspeed deepspeed_configs/zero1.json
 We provide default configurations for:
 
 - ZeRO Stage 1 (`zero1.json`)
+- ZeRO Stage 1 with torch compile (`zero1_torch_compile.json`)
 - ZeRO Stage 2 (`zero2.json`)
 - ZeRO Stage 3 (`zero3.json`)
+- ZeRO Stage 3 with bf16 (`zero3_bf16.json`)
+- ZeRO Stage 3 with bf16 and CPU offload params(`zero3_bf16_cpuoffload_params.json`)
+- ZeRO Stage 3 with bf16 and CPU offload params and optimizer (`zero3_bf16_cpuoffload_all.json`)
 
-Choose based on your memory requirements and performance needs.
+::: {.callout-tip}
+
+Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.
+
+Start from Stage 1 -> Stage 2 -> Stage 3.
+
+:::
 
 ## FSDP {#sec-fsdp}