Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2024-09-05 14:12:25 +00:00
parent 253e9163db
commit 097ec6570f
5 changed files with 34 additions and 34 deletions

View File

@@ -363,7 +363,7 @@ Description
</tr>
</thead>
<tbody class="list">
<tr data-index="0" data-listing-file-modified-sort="1725544715074" data-listing-reading-time-sort="1" data-listing-word-count-sort="47" data-listing-title-sort="Pre-training" data-listing-filename-sort="pretraining.qmd">
<tr data-index="0" data-listing-file-modified-sort="1725545504021" data-listing-reading-time-sort="1" data-listing-word-count-sort="47" data-listing-title-sort="Pre-training" data-listing-filename-sort="pretraining.qmd">
<td>
<a href="../../docs/dataset-formats/pretraining.html" class="title listing-title">Pre-training</a>
</td>
@@ -371,7 +371,7 @@ Description
<span class="listing-description">Data format for a pre-training completion task.</span>
</td>
</tr>
<tr data-index="1" data-listing-file-modified-sort="1725544715074" data-listing-reading-time-sort="2" data-listing-word-count-sort="308" data-listing-title-sort="Instruction Tuning" data-listing-filename-sort="inst_tune.qmd">
<tr data-index="1" data-listing-file-modified-sort="1725545504021" data-listing-reading-time-sort="2" data-listing-word-count-sort="308" data-listing-title-sort="Instruction Tuning" data-listing-filename-sort="inst_tune.qmd">
<td>
<a href="../../docs/dataset-formats/inst_tune.html" class="title listing-title">Instruction Tuning</a>
</td>
@@ -379,7 +379,7 @@ Description
<span class="listing-description">Instruction tuning formats for supervised fine-tuning.</span>
</td>
</tr>
<tr data-index="2" data-listing-file-modified-sort="1725544715074" data-listing-reading-time-sort="2" data-listing-word-count-sort="254" data-listing-title-sort="Conversation" data-listing-filename-sort="conversation.qmd">
<tr data-index="2" data-listing-file-modified-sort="1725545504021" data-listing-reading-time-sort="2" data-listing-word-count-sort="254" data-listing-title-sort="Conversation" data-listing-filename-sort="conversation.qmd">
<td>
<a href="../../docs/dataset-formats/conversation.html" class="title listing-title">Conversation</a>
</td>
@@ -387,7 +387,7 @@ Description
<span class="listing-description">Conversation format for supervised fine-tuning.</span>
</td>
</tr>
<tr data-index="3" data-listing-file-modified-sort="1725544715074" data-listing-reading-time-sort="1" data-listing-word-count-sort="3" data-listing-title-sort="Template-Free" data-listing-filename-sort="template_free.qmd">
<tr data-index="3" data-listing-file-modified-sort="1725545504021" data-listing-reading-time-sort="1" data-listing-word-count-sort="3" data-listing-title-sort="Template-Free" data-listing-filename-sort="template_free.qmd">
<td>
<a href="../../docs/dataset-formats/template_free.html" class="title listing-title">Template-Free</a>
</td>
@@ -395,7 +395,7 @@ Description
<span class="listing-description">Construct prompts without a template.</span>
</td>
</tr>
<tr data-index="4" data-listing-file-modified-sort="1725544715074" data-listing-reading-time-sort="1" data-listing-word-count-sort="90" data-listing-title-sort="Custom Pre-Tokenized Dataset" data-listing-filename-sort="tokenized.qmd">
<tr data-index="4" data-listing-file-modified-sort="1725545504021" data-listing-reading-time-sort="1" data-listing-word-count-sort="92" data-listing-title-sort="Custom Pre-Tokenized Dataset" data-listing-filename-sort="tokenized.qmd">
<td>
<a href="../../docs/dataset-formats/tokenized.html" class="title listing-title">Custom Pre-Tokenized Dataset</a>
</td>

View File

@@ -322,7 +322,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<li>Pass an empty <code>type:</code> in your axolotl config.</li>
<li>Columns in Dataset must be exactly <code>input_ids</code>, <code>attention_mask</code>, <code>labels</code></li>
<li>To indicate that a token should be ignored during training, set its corresponding label to <code>-100</code>.</li>
<li>Do not add BOS/EOS. Axolotl will add them for you based on the default tokenizer for the model youre using.</li>
<li>You must add BOS and EOS, and make sure that you are training on EOS by not setting its label to -100.</li>
<li>For pretraining, do not truncate/pad documents to the context window length.</li>
<li>For instruction training, documents must be truncated/padded as desired.</li>
</ul>