Built site for gh-pages
This commit is contained in:
@@ -800,7 +800,8 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> tokenizer_path,</span>
|
||||
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> token_mappings,</span>
|
||||
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> output_dir,</span>
|
||||
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> revision<span class="op">=</span><span class="st">'main'</span>,</span>
|
||||
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<p>Modify tokenizer files to replace added_tokens strings, save to output directory,
|
||||
and return the path to the modified tokenizer.</p>
|
||||
<p>This only works with reserved tokens that were added to the tokenizer, not tokens
|
||||
@@ -809,10 +810,10 @@ already part of the vocab.</p>
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 16%">
|
||||
<col style="width: 18%">
|
||||
<col style="width: 51%">
|
||||
<col style="width: 12%">
|
||||
<col style="width: 15%">
|
||||
<col style="width: 17%">
|
||||
<col style="width: 54%">
|
||||
<col style="width: 11%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
@@ -841,6 +842,12 @@ already part of the vocab.</p>
|
||||
<td>Directory to save the modified tokenizer</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>revision</td>
|
||||
<td>str</td>
|
||||
<td>Model revision/branch/tag/commit to load from (HF Hub)</td>
|
||||
<td><code>'main'</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
@@ -754,7 +754,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<ul class="collapse">
|
||||
<li><a href="#axolotl.utils.trainer.add_pose_position_ids" id="toc-axolotl.utils.trainer.add_pose_position_ids" class="nav-link" data-scroll-target="#axolotl.utils.trainer.add_pose_position_ids">add_pose_position_ids</a></li>
|
||||
<li><a href="#axolotl.utils.trainer.add_position_ids" id="toc-axolotl.utils.trainer.add_position_ids" class="nav-link" data-scroll-target="#axolotl.utils.trainer.add_position_ids">add_position_ids</a></li>
|
||||
<li><a href="#axolotl.utils.trainer.drop_long_seq" id="toc-axolotl.utils.trainer.drop_long_seq" class="nav-link" data-scroll-target="#axolotl.utils.trainer.drop_long_seq">drop_long_seq</a></li>
|
||||
<li><a href="#axolotl.utils.trainer.filter_sequences_by_length" id="toc-axolotl.utils.trainer.filter_sequences_by_length" class="nav-link" data-scroll-target="#axolotl.utils.trainer.filter_sequences_by_length">filter_sequences_by_length</a></li>
|
||||
<li><a href="#axolotl.utils.trainer.setup_trainer" id="toc-axolotl.utils.trainer.setup_trainer" class="nav-link" data-scroll-target="#axolotl.utils.trainer.setup_trainer">setup_trainer</a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
@@ -790,8 +790,8 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<td>Handle both single-example and batched data.</td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.utils.trainer.drop_long_seq">drop_long_seq</a></td>
|
||||
<td>Drop samples whose sequence length is either too long (> sequence_len)</td>
|
||||
<td><a href="#axolotl.utils.trainer.filter_sequences_by_length">filter_sequences_by_length</a></td>
|
||||
<td>Filter sequences outside valid length range [min_sequence_len, sequence_len].</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><a href="#axolotl.utils.trainer.setup_trainer">setup_trainer</a></td>
|
||||
@@ -822,16 +822,16 @@ remaining in each sample.</p>
|
||||
- single example: sample[‘input_ids’] is a list[int]
|
||||
- batched data: sample[‘input_ids’] is a list[list[int]]</p>
|
||||
</section>
|
||||
<section id="axolotl.utils.trainer.drop_long_seq" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.utils.trainer.drop_long_seq">drop_long_seq</h3>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>utils.trainer.drop_long_seq(</span>
|
||||
<section id="axolotl.utils.trainer.filter_sequences_by_length" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.utils.trainer.filter_sequences_by_length">filter_sequences_by_length</h3>
|
||||
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>utils.trainer.filter_sequences_by_length(</span>
|
||||
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> sample,</span>
|
||||
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> sequence_len<span class="op">=</span><span class="dv">2048</span>,</span>
|
||||
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> min_sequence_len<span class="op">=</span><span class="dv">2</span>,</span>
|
||||
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> raise_on_drop<span class="op">=</span><span class="va">False</span>,</span>
|
||||
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
|
||||
<p>Drop samples whose sequence length is either too long (> sequence_len)
|
||||
or too short (< min_sequence_len).</p>
|
||||
<p>Filter sequences outside valid length range [min_sequence_len, sequence_len].</p>
|
||||
<p>Drops samples that are either too short (< min_sequence_len) or too long (> sequence_len).</p>
|
||||
<p>Works for both single-example (list[int]) or batched (list[list[int]]).</p>
|
||||
<p>If raise_on_drop is set, the code raises a ValueError if a sample is
|
||||
encountered that is too long and would have been dropped.</p>
|
||||
|
||||
Reference in New Issue
Block a user