<spanid="cb1-50"><ahref="#cb1-50"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb1-51"><ahref="#cb1-51"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb2-49"><ahref="#cb2-49"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb2-50"><ahref="#cb2-50"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb3-49"><ahref="#cb3-49"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb3-50"><ahref="#cb3-50"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb4-49"><ahref="#cb4-49"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb4-50"><ahref="#cb4-50"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb5-49"><ahref="#cb5-49"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb5-50"><ahref="#cb5-50"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb6-49"><ahref="#cb6-49"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb6-50"><ahref="#cb6-50"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>Training arguments for Causal trainer</p>
<p>This code is duplicated due to HF TrainingArguments not setting output_dir with a
default value so it can’t be used as a mixin.</p>
@@ -879,9 +885,10 @@ default value so it can’t be used as a mixin.</p>
<spanid="cb7-49"><ahref="#cb7-49"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb7-50"><ahref="#cb7-50"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb1-12"><ahref="#cb1-12"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb1-13"><ahref="#cb1-13"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>Collator for multipack specific to the using the BatchSampler</p>
<spanid="cb2-12"><ahref="#cb2-12"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb2-13"><ahref="#cb2-13"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>Data collator that will dynamically pad the inputs received, as well as the labels and position_ids</p>
<spanid="cb5-12"><ahref="#cb5-12"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb5-13"><ahref="#cb5-13"aria-hidden="true"tabindex="-1"></a>)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>Collator for multipack specific to the using the BatchSampler</p>
<spanid="cb1-687"><ahref="#cb1-687"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Optional; strides across the key dimension. Larger values use more memory but should make training faster.</span></span>
<spanid="cb1-688"><ahref="#cb1-688"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Must evenly divide the number of KV heads in your model.</span></span>
<spanid="cb1-691"><ahref="#cb1-691"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Path to torch distx for optim 'adamw_anyprecision'</span></span>
<spanid="cb1-690"><ahref="#cb1-690"aria-hidden="true"tabindex="-1"></a><spanclass="co"># One of "varlen_llama3", "batch_ring", "batch_zigzag", "batch_stripe". Defaults to "varlen_llama3"</span></span>
<spanid="cb1-691"><ahref="#cb1-691"aria-hidden="true"tabindex="-1"></a><spanclass="co"># in the sample packing case, and "batch_ring" in the non-sample packing case.</span></span>
<spanid="cb1-694"><ahref="#cb1-694"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize</span></span>
<spanid="cb1-694"><ahref="#cb1-694"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Path to torch distx for optim 'adamw_anyprecision'</span></span>
<spanid="cb1-697"><ahref="#cb1-697"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize</span></span>
<spanid="cb1-703"><ahref="#cb1-703"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Allow overwrite yml config using from cli</span></span>
<spanid="cb1-704"><ahref="#cb1-704"aria-hidden="true"tabindex="-1"></a><spanclass="fu">strict</span><spanclass="kw">:</span></span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb1-706"><ahref="#cb1-706"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Allow overwrite yml config using from cli</span></span>
<spanid="cb1-707"><ahref="#cb1-707"aria-hidden="true"tabindex="-1"></a><spanclass="fu">strict</span><spanclass="kw">:</span></span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
@@ -507,7 +507,10 @@ through a ring communication pattern.</p>
<divclass="sourceCode"id="cb1"><preclass="sourceCode yaml code-with-copy"><codeclass="sourceCode yaml"><spanid="cb1-1"><ahref="#cb1-1"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Set to a divisor (> 1) of the number of GPUs available</span></span>
<spanid="cb1-2"><ahref="#cb1-2"aria-hidden="true"tabindex="-1"></a><spanclass="fu">sequence_parallel_degree</span><spanclass="kw">:</span><spanclass="at"></span><spanclass="dv">4</span><spanclass="co"> # Split sequences across 4 GPUs</span></span>
<spanid="cb1-3"><ahref="#cb1-3"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Optional; strides across the key dimension. Larger values use more memory but should make training faster.</span></span>
<spanid="cb1-4"><ahref="#cb1-4"aria-hidden="true"tabindex="-1"></a><spanclass="fu">heads_k_stride</span><spanclass="kw">:</span><spanclass="at"></span><spanclass="dv">1</span></span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb1-5"><ahref="#cb1-5"aria-hidden="true"tabindex="-1"></a><spanclass="co"># Optional; one of "varlen_llama3", "batch_ring", "batch_zigzag", "batch_stripe". Defaults to</span></span>
<spanid="cb1-6"><ahref="#cb1-6"aria-hidden="true"tabindex="-1"></a><spanclass="co"># "varlen_llama3" when `sample_packing: true`, and "batch_ring" otherwise.</span></span>
<spanid="cb1-7"><ahref="#cb1-7"aria-hidden="true"tabindex="-1"></a><spanclass="fu">ring_attn_func</span><spanclass="kw">:</span></span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>The <code>sequence_parallel_degree</code> should be a divisor of the total number of GPUs. For example:</p>
<ul>
<li>With 8 GPUs, valid values would be 2, 4, or 8</li>
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.