Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2026-04-10 20:53:46 +00:00
parent af1d4c8e78
commit d76f8c505c
7 changed files with 594 additions and 370 deletions

View File

@@ -23,6 +23,41 @@ ul.task-list li input[type="checkbox"] {
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
vertical-align: middle;
}
/* CSS for syntax highlighting */
html { -webkit-text-size-adjust: 100%; }
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
}
pre.numberSource { margin-left: 3em; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
</style>
@@ -760,6 +795,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<li><a href="#hyperparameter-ranges" id="toc-hyperparameter-ranges" class="nav-link" data-scroll-target="#hyperparameter-ranges">Hyperparameter Ranges</a></li>
<li><a href="#healthy-training-indicators" id="toc-healthy-training-indicators" class="nav-link" data-scroll-target="#healthy-training-indicators">Healthy Training Indicators</a></li>
<li><a href="#known-issues" id="toc-known-issues" class="nav-link" data-scroll-target="#known-issues">Known Issues</a></li>
<li><a href="#profiling" id="toc-profiling" class="nav-link" data-scroll-target="#profiling">Profiling</a></li>
<li><a href="#file-map" id="toc-file-map" class="nav-link" data-scroll-target="#file-map">File Map</a></li>
</ul></li>
</ul>
@@ -1009,6 +1045,22 @@ Multi-GPU: FSDP or DeepSpeed shards model across GPUs automatically.</code></pre
</tr>
</tbody>
</table>
</section>
<section id="profiling" class="level2">
<h2 class="anchored" data-anchor-id="profiling">Profiling</h2>
<p>To profile training and identify optimization opportunities:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Profile steps 3-7 (after warmup/autotuning settles)</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">profiler_steps_start</span><span class="kw">:</span><span class="at"> </span><span class="dv">3</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">profiler_steps</span><span class="kw">:</span><span class="at"> </span><span class="dv">5</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>This produces <code>profiler_trace.json</code> (Chrome trace) and <code>snapshot.pickle</code> (memory snapshot) in <code>output_dir</code>.
View the Chrome trace at <code>chrome://tracing</code>.</p>
<p>To programmatically inspect the trace:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ex">python</span> scripts/analyze_profile.py output_dir/</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>The trace shows per-kernel CUDA times, memory allocations, and operator-level breakdown. Look for:
- <strong>Large matmul kernels</strong>: candidates for fusion or quantization
- <strong>Memory copies (H2D/D2H)</strong>: unnecessary data movement
- <strong>Small frequent kernels</strong>: candidates for kernel fusion
- <strong>Gaps between kernels</strong>: pipeline bubbles from CPU overhead</p>
<p>Full troubleshooting: <a href="../../docs/training_stability.html">training_stability.qmd</a>, <a href="../../docs/debugging.html">debugging.qmd</a></p>
</section>
<section id="file-map" class="level2">