Built site for gh-pages
This commit is contained in:
@@ -572,6 +572,21 @@ Tip
|
||||
<p>Start from Stage 1 -> Stage 2 -> Stage 3.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout callout-style-default callout-tip callout-titled">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"></i>
|
||||
</div>
|
||||
<div class="callout-title-container flex-fill">
|
||||
Tip
|
||||
</div>
|
||||
</div>
|
||||
<div class="callout-body-container callout-body">
|
||||
<p>Using ZeRO Stage 3 with Single-GPU training</p>
|
||||
<p>ZeRO Stage 3 can be used for training on a single GPU by manually setting the environment variables:
|
||||
<code>WORLD_SIZE=1 LOCAL_RANK=0 MASTER_ADDR=0.0.0.0 MASTER_PORT=29500</code></p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
<section id="sec-fsdp" class="level2" data-number="3">
|
||||
@@ -1161,65 +1176,74 @@ single sequence causes OOM errors during model training.</p>
|
||||
<span id="cb4-66"><a href="#cb4-66" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-67"><a href="#cb4-67" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb4-68"><a href="#cb4-68" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-69"><a href="#cb4-69" aria-hidden="true" tabindex="-1"></a><span class="fu">## FSDP {#sec-fsdp}</span></span>
|
||||
<span id="cb4-69"><a href="#cb4-69" aria-hidden="true" tabindex="-1"></a>::: {.callout-tip}</span>
|
||||
<span id="cb4-70"><a href="#cb4-70" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-71"><a href="#cb4-71" aria-hidden="true" tabindex="-1"></a><span class="fu">### Basic FSDP Configuration {#sec-fsdp-config}</span></span>
|
||||
<span id="cb4-71"><a href="#cb4-71" aria-hidden="true" tabindex="-1"></a>Using ZeRO Stage 3 with Single-GPU training</span>
|
||||
<span id="cb4-72"><a href="#cb4-72" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-73"><a href="#cb4-73" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb4-74"><a href="#cb4-74" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
||||
<span id="cb4-75"><a href="#cb4-75" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
|
||||
<span id="cb4-76"><a href="#cb4-76" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
|
||||
<span id="cb4-77"><a href="#cb4-77" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb4-78"><a href="#cb4-78" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb4-79"><a href="#cb4-79" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb4-80"><a href="#cb4-80" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
|
||||
<span id="cb4-81"><a href="#cb4-81" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb4-82"><a href="#cb4-82" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-83"><a href="#cb4-83" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
|
||||
<span id="cb4-84"><a href="#cb4-84" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-85"><a href="#cb4-85" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
|
||||
<span id="cb4-86"><a href="#cb4-86" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
|
||||
<span id="cb4-87"><a href="#cb4-87" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
|
||||
<span id="cb4-88"><a href="#cb4-88" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
|
||||
<span id="cb4-89"><a href="#cb4-89" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-90"><a href="#cb4-90" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more information.</span>
|
||||
<span id="cb4-73"><a href="#cb4-73" aria-hidden="true" tabindex="-1"></a>ZeRO Stage 3 can be used for training on a single GPU by manually setting the environment variables:</span>
|
||||
<span id="cb4-74"><a href="#cb4-74" aria-hidden="true" tabindex="-1"></a><span class="in">`WORLD_SIZE=1 LOCAL_RANK=0 MASTER_ADDR=0.0.0.0 MASTER_PORT=29500`</span></span>
|
||||
<span id="cb4-75"><a href="#cb4-75" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-76"><a href="#cb4-76" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb4-77"><a href="#cb4-77" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-78"><a href="#cb4-78" aria-hidden="true" tabindex="-1"></a><span class="fu">## FSDP {#sec-fsdp}</span></span>
|
||||
<span id="cb4-79"><a href="#cb4-79" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-80"><a href="#cb4-80" aria-hidden="true" tabindex="-1"></a><span class="fu">### Basic FSDP Configuration {#sec-fsdp-config}</span></span>
|
||||
<span id="cb4-81"><a href="#cb4-81" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-82"><a href="#cb4-82" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb4-83"><a href="#cb4-83" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
||||
<span id="cb4-84"><a href="#cb4-84" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
|
||||
<span id="cb4-85"><a href="#cb4-85" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
|
||||
<span id="cb4-86"><a href="#cb4-86" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb4-87"><a href="#cb4-87" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb4-88"><a href="#cb4-88" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb4-89"><a href="#cb4-89" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
|
||||
<span id="cb4-90"><a href="#cb4-90" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb4-91"><a href="#cb4-91" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-92"><a href="#cb4-92" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
|
||||
<span id="cb4-92"><a href="#cb4-92" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
|
||||
<span id="cb4-93"><a href="#cb4-93" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-94"><a href="#cb4-94" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
|
||||
<span id="cb4-95"><a href="#cb4-95" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-96"><a href="#cb4-96" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
|
||||
<span id="cb4-97"><a href="#cb4-97" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-98"><a href="#cb4-98" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
|
||||
<span id="cb4-99"><a href="#cb4-99" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-100"><a href="#cb4-100" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
|
||||
<span id="cb4-101"><a href="#cb4-101" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-102"><a href="#cb4-102" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
|
||||
<span id="cb4-103"><a href="#cb4-103" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-104"><a href="#cb4-104" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
|
||||
<span id="cb4-105"><a href="#cb4-105" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-106"><a href="#cb4-106" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
|
||||
<span id="cb4-107"><a href="#cb4-107" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-108"><a href="#cb4-108" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
|
||||
<span id="cb4-109"><a href="#cb4-109" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-110"><a href="#cb4-110" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
|
||||
<span id="cb4-111"><a href="#cb4-111" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-112"><a href="#cb4-112" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
|
||||
<span id="cb4-113"><a href="#cb4-113" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-114"><a href="#cb4-114" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
|
||||
<span id="cb4-115"><a href="#cb4-115" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
|
||||
<span id="cb4-116"><a href="#cb4-116" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
|
||||
<span id="cb4-117"><a href="#cb4-117" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
|
||||
<span id="cb4-94"><a href="#cb4-94" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
|
||||
<span id="cb4-95"><a href="#cb4-95" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
|
||||
<span id="cb4-96"><a href="#cb4-96" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
|
||||
<span id="cb4-97"><a href="#cb4-97" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
|
||||
<span id="cb4-98"><a href="#cb4-98" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-99"><a href="#cb4-99" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more information.</span>
|
||||
<span id="cb4-100"><a href="#cb4-100" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-101"><a href="#cb4-101" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
|
||||
<span id="cb4-102"><a href="#cb4-102" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-103"><a href="#cb4-103" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
|
||||
<span id="cb4-104"><a href="#cb4-104" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-105"><a href="#cb4-105" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
|
||||
<span id="cb4-106"><a href="#cb4-106" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-107"><a href="#cb4-107" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
|
||||
<span id="cb4-108"><a href="#cb4-108" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-109"><a href="#cb4-109" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
|
||||
<span id="cb4-110"><a href="#cb4-110" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-111"><a href="#cb4-111" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
|
||||
<span id="cb4-112"><a href="#cb4-112" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-113"><a href="#cb4-113" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
|
||||
<span id="cb4-114"><a href="#cb4-114" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-115"><a href="#cb4-115" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
|
||||
<span id="cb4-116"><a href="#cb4-116" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-117"><a href="#cb4-117" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
|
||||
<span id="cb4-118"><a href="#cb4-118" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-119"><a href="#cb4-119" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
|
||||
<span id="cb4-119"><a href="#cb4-119" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
|
||||
<span id="cb4-120"><a href="#cb4-120" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-121"><a href="#cb4-121" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
|
||||
<span id="cb4-122"><a href="#cb4-122" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
|
||||
<span id="cb4-123"><a href="#cb4-123" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
|
||||
<span id="cb4-124"><a href="#cb4-124" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-125"><a href="#cb4-125" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb4-126"><a href="#cb4-126" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-127"><a href="#cb4-127" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></pre></div>
|
||||
<span id="cb4-121"><a href="#cb4-121" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
|
||||
<span id="cb4-122"><a href="#cb4-122" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-123"><a href="#cb4-123" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
|
||||
<span id="cb4-124"><a href="#cb4-124" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
|
||||
<span id="cb4-125"><a href="#cb4-125" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
|
||||
<span id="cb4-126"><a href="#cb4-126" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
|
||||
<span id="cb4-127"><a href="#cb4-127" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-128"><a href="#cb4-128" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
|
||||
<span id="cb4-129"><a href="#cb4-129" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-130"><a href="#cb4-130" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
|
||||
<span id="cb4-131"><a href="#cb4-131" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
|
||||
<span id="cb4-132"><a href="#cb4-132" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
|
||||
<span id="cb4-133"><a href="#cb4-133" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-134"><a href="#cb4-134" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb4-135"><a href="#cb4-135" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-136"><a href="#cb4-136" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></pre></div>
|
||||
</div></div></div></div></div>
|
||||
</div> <!-- /content -->
|
||||
|
||||
|
||||
Reference in New Issue
Block a user