Built site for gh-pages
This commit is contained in:
@@ -539,7 +539,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.common.datasets.sample_dataset">sample_dataset</a></td>
|
||||
<td>Randomly sample <code>num_samples</code> samples from <code>dataset</code>.</td>
|
||||
<td>Randomly sample <code>num_samples</code> samples with replacement from <code>dataset</code>.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
@@ -547,15 +547,15 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<h3 class="anchored" data-anchor-id="axolotl.common.datasets.load_datasets">load_datasets</h3>
|
||||
<div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>common.datasets.load_datasets(cfg, cli_args<span class="op">=</span><span class="va">None</span>, debug<span class="op">=</span><span class="va">False</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Loads one or more training or evaluation datasets, calling
|
||||
<code>axolotl.utils.data.prepare_dataset</code>. Optionally, logs out debug information.</p>
|
||||
<code>axolotl.utils.data.prepare_datasets</code>. Optionally, logs out debug information.</p>
|
||||
<section id="parameters" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 8%">
|
||||
<col style="width: 37%">
|
||||
<col style="width: 44%">
|
||||
<col style="width: 10%">
|
||||
<col style="width: 4%">
|
||||
<col style="width: 22%">
|
||||
<col style="width: 67%">
|
||||
<col style="width: 5%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
@@ -581,7 +581,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<tr class="odd">
|
||||
<td>debug</td>
|
||||
<td>bool</td>
|
||||
<td>Whether to print out tokenization of sample</td>
|
||||
<td>Whether to print out tokenization of sample. This is duplicated in <code>cfg</code> and <code>cli_args</code>, but is kept due to use in our Colab notebooks.</td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
@@ -591,9 +591,9 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 7%">
|
||||
<col style="width: 17%">
|
||||
<col style="width: 74%">
|
||||
<col style="width: 6%">
|
||||
<col style="width: 14%">
|
||||
<col style="width: 78%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
@@ -606,12 +606,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<tr class="odd">
|
||||
<td></td>
|
||||
<td>TrainDatasetMeta</td>
|
||||
<td>Dataclass with fields for training and evaluation datasets and the computed</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td></td>
|
||||
<td>TrainDatasetMeta</td>
|
||||
<td><code>total_num_steps</code>.</td>
|
||||
<td>Dataclass with fields for training and evaluation datasets and the computed <code>total_num_steps</code>.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
@@ -621,15 +616,15 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<h3 class="anchored" data-anchor-id="axolotl.common.datasets.load_preference_datasets">load_preference_datasets</h3>
|
||||
<div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>common.datasets.load_preference_datasets(cfg, cli_args)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Loads one or more training or evaluation datasets for RL training using paired
|
||||
preference data, calling <code>axolotl.utils.data.rl.load_prepare_preference_datasets</code>.
|
||||
preference data, calling <code>axolotl.utils.data.rl.prepare_preference_datasets</code>.
|
||||
Optionally, logs out debug information.</p>
|
||||
<section id="parameters-1" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 8%">
|
||||
<col style="width: 36%">
|
||||
<col style="width: 44%">
|
||||
<col style="width: 33%">
|
||||
<col style="width: 47%">
|
||||
<col style="width: 10%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
@@ -649,7 +644,7 @@ Optionally, logs out debug information.</p>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>cli_args</td>
|
||||
<td>Union[PreprocessCliArgs, TrainerCliArgs]</td>
|
||||
<td>PreprocessCliArgs | TrainerCliArgs</td>
|
||||
<td>Command-specific CLI arguments.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
@@ -689,63 +684,12 @@ Optionally, logs out debug information.</p>
|
||||
<section id="axolotl.common.datasets.sample_dataset" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.common.datasets.sample_dataset">sample_dataset</h3>
|
||||
<div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>common.datasets.sample_dataset(dataset, num_samples)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Randomly sample <code>num_samples</code> samples from <code>dataset</code>.</p>
|
||||
<section id="parameters-2" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Description</th>
|
||||
<th>Default</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td>dataset</td>
|
||||
<td>Dataset</td>
|
||||
<td>Dataset.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>num_samples</td>
|
||||
<td>int</td>
|
||||
<td>Number of samples to return.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
<section id="returns-2" class="level4 doc-section doc-section-returns">
|
||||
<h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 10%">
|
||||
<col style="width: 11%">
|
||||
<col style="width: 77%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td></td>
|
||||
<td>Dataset</td>
|
||||
<td>Random sample (with replacement) of examples in <code>dataset</code>.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<p>Randomly sample <code>num_samples</code> samples with replacement from <code>dataset</code>.</p>
|
||||
|
||||
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</main> <!-- /main -->
|
||||
<script id="quarto-html-after-body" type="application/javascript">
|
||||
|
||||
@@ -500,7 +500,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.datasets.ConstantLengthDataset">ConstantLengthDataset</a></td>
|
||||
<td>Iterable dataset that returns constant length chunks of tokens from stream of text files.</td>
|
||||
<td>Iterable dataset that returns constant length chunks of tokens from stream of</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><a href="#axolotl.datasets.TokenizedPromptDataset">TokenizedPromptDataset</a></td>
|
||||
@@ -511,11 +511,47 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<section id="axolotl.datasets.ConstantLengthDataset" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.datasets.ConstantLengthDataset">ConstantLengthDataset</h3>
|
||||
<div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>datasets.ConstantLengthDataset(tokenizer, datasets, seq_length<span class="op">=</span><span class="dv">2048</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Iterable dataset that returns constant length chunks of tokens from stream of text files.
|
||||
Args:
|
||||
tokenizer (Tokenizer): The processor used for processing the data.
|
||||
dataset (dataset.Dataset): Dataset with text files.
|
||||
seq_length (int): Length of token sequences to return.</p>
|
||||
<p>Iterable dataset that returns constant length chunks of tokens from stream of
|
||||
text files.</p>
|
||||
<section id="parameters" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 15%">
|
||||
<col style="width: 10%">
|
||||
<col style="width: 58%">
|
||||
<col style="width: 15%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Description</th>
|
||||
<th>Default</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td>tokenizer</td>
|
||||
<td></td>
|
||||
<td>The processor used for processing the data.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>dataset</td>
|
||||
<td></td>
|
||||
<td>Dataset with text files.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td>seq_length</td>
|
||||
<td></td>
|
||||
<td>Length of token sequences to return.</td>
|
||||
<td><code>2048</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
</section>
|
||||
<section id="axolotl.datasets.TokenizedPromptDataset" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.datasets.TokenizedPromptDataset">TokenizedPromptDataset</h3>
|
||||
@@ -526,17 +562,57 @@ seq_length (int): Length of token sequences to return.</p>
|
||||
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> keep_in_memory<span class="op">=</span><span class="va">False</span>,</span>
|
||||
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span>
|
||||
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Dataset that returns tokenized prompts from a stream of text files.
|
||||
Args:
|
||||
prompt_tokenizer (PromptTokenizingStrategy): The prompt tokenizing method for processing the data.
|
||||
dataset (dataset.Dataset): Dataset with text files.
|
||||
process_count (int): Number of processes to use for tokenizing.
|
||||
keep_in_memory (bool): Whether to keep the tokenized dataset in memory.</p>
|
||||
<p>Dataset that returns tokenized prompts from a stream of text files.</p>
|
||||
<section id="parameters-1" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 16%">
|
||||
<col style="width: 23%">
|
||||
<col style="width: 49%">
|
||||
<col style="width: 10%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Description</th>
|
||||
<th>Default</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td>prompt_tokenizer</td>
|
||||
<td>PromptTokenizingStrategy</td>
|
||||
<td>The prompt tokenizing method for processing the data.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>dataset</td>
|
||||
<td>Dataset</td>
|
||||
<td>Dataset with text files.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td>process_count</td>
|
||||
<td>int | None</td>
|
||||
<td>Number of processes to use for tokenizing.</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>keep_in_memory</td>
|
||||
<td>bool | None</td>
|
||||
<td>Whether to keep the tokenized dataset in memory.</td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</main> <!-- /main -->
|
||||
<script id="quarto-html-after-body" type="application/javascript">
|
||||
|
||||
@@ -981,7 +981,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><a href="../../docs/api/utils.data.sft.html#axolotl.utils.data.sft">utils.data.sft</a></td>
|
||||
<td>data handling specific to SFT</td>
|
||||
<td>Data handling specific to SFT.</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><a href="../../docs/api/utils.quantization.html#axolotl.utils.quantization">utils.quantization</a></td>
|
||||
|
||||
@@ -534,7 +534,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.train.setup_model_and_tokenizer">setup_model_and_tokenizer</a></td>
|
||||
<td>Load the tokenizer, processor (for multimodal models), and model based on configuration.</td>
|
||||
<td>Load the tokenizer, processor (for multimodal models), and model based on</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><a href="#axolotl.train.setup_model_and_trainer">setup_model_and_trainer</a></td>
|
||||
@@ -861,7 +861,8 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<section id="axolotl.train.setup_model_and_tokenizer" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.train.setup_model_and_tokenizer">setup_model_and_tokenizer</h3>
|
||||
<div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>train.setup_model_and_tokenizer(cfg)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Load the tokenizer, processor (for multimodal models), and model based on configuration.</p>
|
||||
<p>Load the tokenizer, processor (for multimodal models), and model based on
|
||||
configuration.</p>
|
||||
<section id="parameters-6" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
|
||||
@@ -20,6 +20,41 @@ ul.task-list li input[type="checkbox"] {
|
||||
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
|
||||
vertical-align: middle;
|
||||
}
|
||||
/* CSS for syntax highlighting */
|
||||
html { -webkit-text-size-adjust: 100%; }
|
||||
pre > code.sourceCode { white-space: pre; position: relative; }
|
||||
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
|
||||
pre > code.sourceCode > span:empty { height: 1.2em; }
|
||||
.sourceCode { overflow: visible; }
|
||||
code.sourceCode > span { color: inherit; text-decoration: inherit; }
|
||||
div.sourceCode { margin: 1em 0; }
|
||||
pre.sourceCode { margin: 0; }
|
||||
@media screen {
|
||||
div.sourceCode { overflow: auto; }
|
||||
}
|
||||
@media print {
|
||||
pre > code.sourceCode { white-space: pre-wrap; }
|
||||
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
|
||||
}
|
||||
pre.numberSource code
|
||||
{ counter-reset: source-line 0; }
|
||||
pre.numberSource code > span
|
||||
{ position: relative; left: -4em; counter-increment: source-line; }
|
||||
pre.numberSource code > span > a:first-child::before
|
||||
{ content: counter(source-line);
|
||||
position: relative; left: -1em; text-align: right; vertical-align: baseline;
|
||||
border: none; display: inline-block;
|
||||
-webkit-touch-callout: none; -webkit-user-select: none;
|
||||
-khtml-user-select: none; -moz-user-select: none;
|
||||
-ms-user-select: none; user-select: none;
|
||||
padding: 0 4px; width: 4em;
|
||||
}
|
||||
pre.numberSource { margin-left: 3em; padding-left: 4px; }
|
||||
div.sourceCode
|
||||
{ }
|
||||
@media screen {
|
||||
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
|
||||
}
|
||||
</style>
|
||||
|
||||
|
||||
@@ -432,7 +467,13 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<h2 id="toc-title">On this page</h2>
|
||||
|
||||
<ul>
|
||||
<li><a href="#axolotl.utils.data.sft" id="toc-axolotl.utils.data.sft" class="nav-link active" data-scroll-target="#axolotl.utils.data.sft">utils.data.sft</a></li>
|
||||
<li><a href="#axolotl.utils.data.sft" id="toc-axolotl.utils.data.sft" class="nav-link active" data-scroll-target="#axolotl.utils.data.sft">utils.data.sft</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#functions" id="toc-functions" class="nav-link" data-scroll-target="#functions">Functions</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#axolotl.utils.data.sft.prepare_datasets" id="toc-axolotl.utils.data.sft.prepare_datasets" class="nav-link" data-scroll-target="#axolotl.utils.data.sft.prepare_datasets">prepare_datasets</a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
</nav>
|
||||
</div>
|
||||
@@ -445,9 +486,105 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
|
||||
<section id="axolotl.utils.data.sft" class="level1">
|
||||
<h1>utils.data.sft</h1>
|
||||
<p><code>utils.data.sft</code></p>
|
||||
<p>data handling specific to SFT</p>
|
||||
<p>Data handling specific to SFT.</p>
|
||||
<section id="functions" class="level2">
|
||||
<h2 class="anchored" data-anchor-id="functions">Functions</h2>
|
||||
<table class="caption-top table">
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.utils.data.sft.prepare_datasets">prepare_datasets</a></td>
|
||||
<td>Prepare training and evaluation datasets based on configuration.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<section id="axolotl.utils.data.sft.prepare_datasets" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.utils.data.sft.prepare_datasets">prepare_datasets</h3>
|
||||
<div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>utils.data.sft.prepare_datasets(</span>
|
||||
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> cfg,</span>
|
||||
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> tokenizer,</span>
|
||||
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> processor<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> preprocess_iterable<span class="op">=</span><span class="va">False</span>,</span>
|
||||
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Prepare training and evaluation datasets based on configuration.</p>
|
||||
<section id="parameters" class="level4 doc-section doc-section-parameters">
|
||||
<h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 19%">
|
||||
<col style="width: 21%">
|
||||
<col style="width: 48%">
|
||||
<col style="width: 10%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Description</th>
|
||||
<th>Default</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td>cfg</td>
|
||||
<td>DictDefault</td>
|
||||
<td>Dictionary mapping <code>axolotl</code> config keys to values.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>tokenizer</td>
|
||||
<td>PreTrainedTokenizer</td>
|
||||
<td>Tokenizer to use for processing text.</td>
|
||||
<td><em>required</em></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td>processor</td>
|
||||
<td>ProcessorMixin | None</td>
|
||||
<td>Optional processor for multimodal datasets.</td>
|
||||
<td><code>None</code></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td>preprocess_iterable</td>
|
||||
<td>bool</td>
|
||||
<td>Whether to use iterable preprocessing.</td>
|
||||
<td><code>False</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
<section id="returns" class="level4 doc-section doc-section-returns">
|
||||
<h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4>
|
||||
<table class="caption-top table">
|
||||
<colgroup>
|
||||
<col style="width: 5%">
|
||||
<col style="width: 53%">
|
||||
<col style="width: 41%">
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td></td>
|
||||
<td>tuple[IterableDataset | Dataset, Dataset | None, int, list[Prompter | None]]</td>
|
||||
<td>Tuple of (train_dataset, eval_dataset, total_steps, prompters).</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</main> <!-- /main -->
|
||||
|
||||
Reference in New Issue
Block a user