Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2025-10-14 19:59:54 +00:00
parent 8aa2d9ddea
commit 2327fe0f9b
7 changed files with 950 additions and 945 deletions

View File

@@ -588,7 +588,7 @@ All of these tips are incorporated into the <a href="#configuration">example con
<li><p><strong>Eliminate concurrency</strong>: Restrict the number of processes to 1 for both training and data preprocessing:</p>
<ul>
<li>Set <code>CUDA_VISIBLE_DEVICES</code> to a single GPU, ex: <code>export CUDA_VISIBLE_DEVICES=0</code>.</li>
<li>Set <code>dataset_processes: 1</code> in your axolotl config or run the training command with <code>--dataset_processes=1</code>.</li>
<li>Set <code>dataset_num_proc: 1</code> in your axolotl config or run the training command with <code>--dataset_num_proc=1</code>.</li>
</ul></li>
<li><p><strong>Use a small dataset</strong>: Construct or use a small dataset from HF Hub. When using a small dataset, you will often have to make sure <code>sample_packing: False</code> and <code>eval_sample_packing: False</code> to avoid errors. If you are in a pinch and dont have time to construct a small dataset but want to use from the HF Hub, you can shard the data (this will still tokenize the entire dataset, but will only use a fraction of the data for training. For example, to shard the dataset into 20 pieces, add the following to your axolotl config):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">datasets</span><span class="kw">:</span></span>
@@ -653,7 +653,7 @@ If you prefer to watch a video, rather than read, you can skip to the <a href="#
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> <span class="st">"-m"</span><span class="ot">,</span> <span class="st">"axolotl.cli.train"</span><span class="ot">,</span> <span class="st">"dev_chat_template.yml"</span><span class="ot">,</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a> <span class="er">//</span> <span class="er">The</span> <span class="er">flags</span> <span class="er">below</span> <span class="er">simplify</span> <span class="er">debugging</span> <span class="er">by</span> <span class="er">overriding</span> <span class="er">the</span> <span class="er">axolotl</span> <span class="er">config</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> <span class="er">//</span> <span class="er">with</span> <span class="er">the</span> <span class="er">debugging</span> <span class="er">tips</span> <span class="er">above.</span> <span class="er">Modify</span> <span class="er">as</span> <span class="er">needed.</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> <span class="st">"--dataset_processes=1"</span><span class="ot">,</span> <span class="er">//</span> <span class="er">limits</span> <span class="er">data</span> <span class="er">preprocessing</span> <span class="er">to</span> <span class="er">one</span> <span class="er">process</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> <span class="st">"--dataset_num_proc=1"</span><span class="ot">,</span> <span class="er">//</span> <span class="er">limits</span> <span class="er">data</span> <span class="er">preprocessing</span> <span class="er">to</span> <span class="er">one</span> <span class="er">process</span></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a> <span class="st">"--max_steps=1"</span><span class="ot">,</span> <span class="er">//</span> <span class="er">limits</span> <span class="er">training</span> <span class="er">to</span> <span class="er">just</span> <span class="er">one</span> <span class="er">step</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a> <span class="st">"--batch_size=1"</span><span class="ot">,</span> <span class="er">//</span> <span class="er">minimizes</span> <span class="er">batch</span> <span class="er">size</span></span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a> <span class="st">"--micro_batch_size=1"</span><span class="ot">,</span> <span class="er">//</span> <span class="er">minimizes</span> <span class="er">batch</span> <span class="er">size</span></span>