Built site for gh-pages
This commit is contained in:
@@ -677,6 +677,10 @@ also follow the config field mapping below to update field names.</p>
|
||||
<td>fsdp_use_orig_params</td>
|
||||
<td><strong>REMOVED</strong></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td>fsdp_activation_checkpointing</td>
|
||||
<td>activation_checkpointing</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>For more details, please see the migration guide in the <a href="https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md">torchtitan repo</a>. In Axolotl,
|
||||
@@ -1321,98 +1325,99 @@ single sequence causes OOM errors during model training.</p>
|
||||
<span id="cb6-88"><a href="#cb6-88" aria-hidden="true" tabindex="-1"></a>fsdp_cpu_ram_efficient_loading | cpu_ram_efficient_loading</span>
|
||||
<span id="cb6-89"><a href="#cb6-89" aria-hidden="true" tabindex="-1"></a>fsdp_state_dict_type | state_dict_type</span>
|
||||
<span id="cb6-90"><a href="#cb6-90" aria-hidden="true" tabindex="-1"></a>fsdp_use_orig_params | **REMOVED**</span>
|
||||
<span id="cb6-91"><a href="#cb6-91" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-92"><a href="#cb6-92" aria-hidden="true" tabindex="-1"></a>For more details, please see the migration guide in the <span class="co">[</span><span class="ot">torchtitan repo</span><span class="co">](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md)</span>. In Axolotl,</span>
|
||||
<span id="cb6-93"><a href="#cb6-93" aria-hidden="true" tabindex="-1"></a>if you were using the following FSDP1 config:</span>
|
||||
<span id="cb6-94"><a href="#cb6-94" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-95"><a href="#cb6-95" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb6-96"><a href="#cb6-96" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_version</span><span class="kw">:</span><span class="at"> </span><span class="dv">1</span></span>
|
||||
<span id="cb6-97"><a href="#cb6-97" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb6-98"><a href="#cb6-98" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb6-99"><a href="#cb6-99" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_cpu_ram_efficient_loading</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-100"><a href="#cb6-100" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_auto_wrap_policy</span><span class="kw">:</span><span class="at"> TRANSFORMER_BASED_WRAP</span></span>
|
||||
<span id="cb6-101"><a href="#cb6-101" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> Qwen3DecoderLayer</span></span>
|
||||
<span id="cb6-102"><a href="#cb6-102" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb6-103"><a href="#cb6-103" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_sharding_strategy</span><span class="kw">:</span><span class="at"> FULL_SHARD</span></span>
|
||||
<span id="cb6-104"><a href="#cb6-104" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb6-105"><a href="#cb6-105" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-106"><a href="#cb6-106" aria-hidden="true" tabindex="-1"></a>You can migrate to the following FSDP2 config:</span>
|
||||
<span id="cb6-107"><a href="#cb6-107" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-108"><a href="#cb6-108" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb6-109"><a href="#cb6-109" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_version</span><span class="kw">:</span><span class="at"> </span><span class="dv">2</span></span>
|
||||
<span id="cb6-110"><a href="#cb6-110" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb6-111"><a href="#cb6-111" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb6-112"><a href="#cb6-112" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">cpu_ram_efficient_loading</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-113"><a href="#cb6-113" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">auto_wrap_policy</span><span class="kw">:</span><span class="at"> TRANSFORMER_BASED_WRAP</span></span>
|
||||
<span id="cb6-114"><a href="#cb6-114" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> Qwen3DecoderLayer</span></span>
|
||||
<span id="cb6-115"><a href="#cb6-115" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb6-116"><a href="#cb6-116" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">reshard_after_forward</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-117"><a href="#cb6-117" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb6-118"><a href="#cb6-118" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-119"><a href="#cb6-119" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP1 (deprecated) {#sec-fsdp-config}</span></span>
|
||||
<span id="cb6-120"><a href="#cb6-120" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-121"><a href="#cb6-121" aria-hidden="true" tabindex="-1"></a>::: {.callout-note}</span>
|
||||
<span id="cb6-122"><a href="#cb6-122" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-123"><a href="#cb6-123" aria-hidden="true" tabindex="-1"></a>Using <span class="in">`fsdp`</span> to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use <span class="in">`fsdp_config`</span> as above instead.</span>
|
||||
<span id="cb6-124"><a href="#cb6-124" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-125"><a href="#cb6-125" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb6-126"><a href="#cb6-126" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-127"><a href="#cb6-127" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb6-128"><a href="#cb6-128" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
||||
<span id="cb6-129"><a href="#cb6-129" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
|
||||
<span id="cb6-130"><a href="#cb6-130" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
|
||||
<span id="cb6-131"><a href="#cb6-131" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb6-132"><a href="#cb6-132" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-133"><a href="#cb6-133" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb6-134"><a href="#cb6-134" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
|
||||
<span id="cb6-135"><a href="#cb6-135" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb6-136"><a href="#cb6-136" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-91"><a href="#cb6-91" aria-hidden="true" tabindex="-1"></a>fsdp_activation_checkpointing | activation_checkpointing</span>
|
||||
<span id="cb6-92"><a href="#cb6-92" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-93"><a href="#cb6-93" aria-hidden="true" tabindex="-1"></a>For more details, please see the migration guide in the <span class="co">[</span><span class="ot">torchtitan repo</span><span class="co">](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md)</span>. In Axolotl,</span>
|
||||
<span id="cb6-94"><a href="#cb6-94" aria-hidden="true" tabindex="-1"></a>if you were using the following FSDP1 config:</span>
|
||||
<span id="cb6-95"><a href="#cb6-95" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-96"><a href="#cb6-96" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb6-97"><a href="#cb6-97" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_version</span><span class="kw">:</span><span class="at"> </span><span class="dv">1</span></span>
|
||||
<span id="cb6-98"><a href="#cb6-98" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb6-99"><a href="#cb6-99" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb6-100"><a href="#cb6-100" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_cpu_ram_efficient_loading</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-101"><a href="#cb6-101" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_auto_wrap_policy</span><span class="kw">:</span><span class="at"> TRANSFORMER_BASED_WRAP</span></span>
|
||||
<span id="cb6-102"><a href="#cb6-102" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> Qwen3DecoderLayer</span></span>
|
||||
<span id="cb6-103"><a href="#cb6-103" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb6-104"><a href="#cb6-104" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_sharding_strategy</span><span class="kw">:</span><span class="at"> FULL_SHARD</span></span>
|
||||
<span id="cb6-105"><a href="#cb6-105" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb6-106"><a href="#cb6-106" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-107"><a href="#cb6-107" aria-hidden="true" tabindex="-1"></a>You can migrate to the following FSDP2 config:</span>
|
||||
<span id="cb6-108"><a href="#cb6-108" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-109"><a href="#cb6-109" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb6-110"><a href="#cb6-110" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_version</span><span class="kw">:</span><span class="at"> </span><span class="dv">2</span></span>
|
||||
<span id="cb6-111"><a href="#cb6-111" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb6-112"><a href="#cb6-112" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">false</span></span>
|
||||
<span id="cb6-113"><a href="#cb6-113" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">cpu_ram_efficient_loading</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-114"><a href="#cb6-114" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">auto_wrap_policy</span><span class="kw">:</span><span class="at"> TRANSFORMER_BASED_WRAP</span></span>
|
||||
<span id="cb6-115"><a href="#cb6-115" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> Qwen3DecoderLayer</span></span>
|
||||
<span id="cb6-116"><a href="#cb6-116" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb6-117"><a href="#cb6-117" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">reshard_after_forward</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-118"><a href="#cb6-118" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb6-119"><a href="#cb6-119" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-120"><a href="#cb6-120" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP1 (deprecated) {#sec-fsdp-config}</span></span>
|
||||
<span id="cb6-121"><a href="#cb6-121" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-122"><a href="#cb6-122" aria-hidden="true" tabindex="-1"></a>::: {.callout-note}</span>
|
||||
<span id="cb6-123"><a href="#cb6-123" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-124"><a href="#cb6-124" aria-hidden="true" tabindex="-1"></a>Using <span class="in">`fsdp`</span> to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use <span class="in">`fsdp_config`</span> as above instead.</span>
|
||||
<span id="cb6-125"><a href="#cb6-125" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-126"><a href="#cb6-126" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb6-127"><a href="#cb6-127" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-128"><a href="#cb6-128" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb6-129"><a href="#cb6-129" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
||||
<span id="cb6-130"><a href="#cb6-130" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
|
||||
<span id="cb6-131"><a href="#cb6-131" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
|
||||
<span id="cb6-132"><a href="#cb6-132" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb6-133"><a href="#cb6-133" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb6-134"><a href="#cb6-134" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb6-135"><a href="#cb6-135" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
|
||||
<span id="cb6-136"><a href="#cb6-136" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb6-137"><a href="#cb6-137" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-138"><a href="#cb6-138" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
|
||||
<span id="cb6-139"><a href="#cb6-139" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-140"><a href="#cb6-140" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
|
||||
<span id="cb6-141"><a href="#cb6-141" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
|
||||
<span id="cb6-142"><a href="#cb6-142" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
|
||||
<span id="cb6-143"><a href="#cb6-143" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
|
||||
<span id="cb6-144"><a href="#cb6-144" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-145"><a href="#cb6-145" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more information.</span>
|
||||
<span id="cb6-146"><a href="#cb6-146" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-147"><a href="#cb6-147" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
|
||||
<span id="cb6-148"><a href="#cb6-148" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-149"><a href="#cb6-149" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
|
||||
<span id="cb6-150"><a href="#cb6-150" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-151"><a href="#cb6-151" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
|
||||
<span id="cb6-152"><a href="#cb6-152" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-153"><a href="#cb6-153" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
|
||||
<span id="cb6-154"><a href="#cb6-154" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-155"><a href="#cb6-155" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
|
||||
<span id="cb6-156"><a href="#cb6-156" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-157"><a href="#cb6-157" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
|
||||
<span id="cb6-158"><a href="#cb6-158" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-159"><a href="#cb6-159" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
|
||||
<span id="cb6-160"><a href="#cb6-160" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-161"><a href="#cb6-161" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
|
||||
<span id="cb6-162"><a href="#cb6-162" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-163"><a href="#cb6-163" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
|
||||
<span id="cb6-164"><a href="#cb6-164" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-165"><a href="#cb6-165" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
|
||||
<span id="cb6-166"><a href="#cb6-166" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-167"><a href="#cb6-167" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
|
||||
<span id="cb6-168"><a href="#cb6-168" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-169"><a href="#cb6-169" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
|
||||
<span id="cb6-170"><a href="#cb6-170" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
|
||||
<span id="cb6-171"><a href="#cb6-171" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
|
||||
<span id="cb6-172"><a href="#cb6-172" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
|
||||
<span id="cb6-173"><a href="#cb6-173" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-174"><a href="#cb6-174" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
|
||||
<span id="cb6-175"><a href="#cb6-175" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-176"><a href="#cb6-176" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
|
||||
<span id="cb6-177"><a href="#cb6-177" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
|
||||
<span id="cb6-178"><a href="#cb6-178" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
|
||||
<span id="cb6-179"><a href="#cb6-179" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-180"><a href="#cb6-180" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb6-181"><a href="#cb6-181" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-182"><a href="#cb6-182" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></div>
|
||||
<span id="cb6-138"><a href="#cb6-138" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-139"><a href="#cb6-139" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
|
||||
<span id="cb6-140"><a href="#cb6-140" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-141"><a href="#cb6-141" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
|
||||
<span id="cb6-142"><a href="#cb6-142" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
|
||||
<span id="cb6-143"><a href="#cb6-143" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
|
||||
<span id="cb6-144"><a href="#cb6-144" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
|
||||
<span id="cb6-145"><a href="#cb6-145" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-146"><a href="#cb6-146" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more information.</span>
|
||||
<span id="cb6-147"><a href="#cb6-147" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-148"><a href="#cb6-148" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
|
||||
<span id="cb6-149"><a href="#cb6-149" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-150"><a href="#cb6-150" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
|
||||
<span id="cb6-151"><a href="#cb6-151" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-152"><a href="#cb6-152" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
|
||||
<span id="cb6-153"><a href="#cb6-153" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-154"><a href="#cb6-154" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
|
||||
<span id="cb6-155"><a href="#cb6-155" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-156"><a href="#cb6-156" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
|
||||
<span id="cb6-157"><a href="#cb6-157" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-158"><a href="#cb6-158" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
|
||||
<span id="cb6-159"><a href="#cb6-159" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-160"><a href="#cb6-160" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
|
||||
<span id="cb6-161"><a href="#cb6-161" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-162"><a href="#cb6-162" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
|
||||
<span id="cb6-163"><a href="#cb6-163" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-164"><a href="#cb6-164" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
|
||||
<span id="cb6-165"><a href="#cb6-165" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-166"><a href="#cb6-166" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
|
||||
<span id="cb6-167"><a href="#cb6-167" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-168"><a href="#cb6-168" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
|
||||
<span id="cb6-169"><a href="#cb6-169" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-170"><a href="#cb6-170" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
|
||||
<span id="cb6-171"><a href="#cb6-171" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
|
||||
<span id="cb6-172"><a href="#cb6-172" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
|
||||
<span id="cb6-173"><a href="#cb6-173" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
|
||||
<span id="cb6-174"><a href="#cb6-174" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-175"><a href="#cb6-175" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
|
||||
<span id="cb6-176"><a href="#cb6-176" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-177"><a href="#cb6-177" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
|
||||
<span id="cb6-178"><a href="#cb6-178" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
|
||||
<span id="cb6-179"><a href="#cb6-179" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
|
||||
<span id="cb6-180"><a href="#cb6-180" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-181"><a href="#cb6-181" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb6-182"><a href="#cb6-182" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb6-183"><a href="#cb6-183" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></div>
|
||||
</div></div></div></div></div>
|
||||
</div> <!-- /content -->
|
||||
|
||||
|
||||
Reference in New Issue
Block a user