diff --git a/docs/multi-gpu.qmd b/docs/multi-gpu.qmd index 5aec89763..55eaca6c3 100644 --- a/docs/multi-gpu.qmd +++ b/docs/multi-gpu.qmd @@ -36,6 +36,9 @@ deepspeed: deepspeed_configs/zero1.json ### Usage {#sec-deepspeed-usage} ```{.bash} +# Fetch deepspeed configs (if not already present) +axolotl fetch deepspeed_configs + # Passing arg via config axolotl train config.yml @@ -48,10 +51,20 @@ axolotl train config.yml --deepspeed deepspeed_configs/zero1.json We provide default configurations for: - ZeRO Stage 1 (`zero1.json`) +- ZeRO Stage 1 with torch compile (`zero1_torch_compile.json`) - ZeRO Stage 2 (`zero2.json`) - ZeRO Stage 3 (`zero3.json`) +- ZeRO Stage 3 with bf16 (`zero3_bf16.json`) +- ZeRO Stage 3 with bf16 and CPU offload params(`zero3_bf16_cpuoffload_params.json`) +- ZeRO Stage 3 with bf16 and CPU offload params and optimizer (`zero3_bf16_cpuoffload_all.json`) -Choose based on your memory requirements and performance needs. +::: {.callout-tip} + +Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance. + +Start from Stage 1 -> Stage 2 -> Stage 3. + +::: ## FSDP {#sec-fsdp}