Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2026-03-22 13:17:42 +00:00
parent 61e0653994
commit 3c421e0170
247 changed files with 8025 additions and 10900 deletions

View File

@@ -2,13 +2,16 @@
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
<meta charset="utf-8">
<meta name="generator" content="quarto-1.8.27">
<meta name="generator" content="quarto-1.9.36">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>Gradient Checkpointing and Activation Offloading Axolotl</title>
<title>Gradient Checkpointing, Activation Offloading, and Layer Offloading Axolotl</title>
<style>
/* Default styles provided by pandoc.
** See https://pandoc.org/MANUAL.html#variables-for-html for config info.
*/
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
@@ -67,15 +70,14 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<link href="../favicon.jpg" rel="icon" type="image/jpeg">
<script src="../site_libs/quarto-html/quarto.js" type="module"></script>
<script src="../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script>
<script src="../site_libs/quarto-html/axe/axe-check.js" type="module"></script>
<script src="../site_libs/quarto-html/popper.min.js"></script>
<script src="../site_libs/quarto-html/tippy.umd.min.js"></script>
<script src="../site_libs/quarto-html/anchor.min.js"></script>
<link href="../site_libs/quarto-html/tippy.css" rel="stylesheet">
<link href="../site_libs/quarto-html/quarto-syntax-highlighting-dark-4d9afe2b8d18ee9fa5d0d57b5ed4214d.css" rel="stylesheet" id="quarto-text-highlighting-styles">
<link href="../site_libs/quarto-html/quarto-syntax-highlighting-dark-f418161beb48e0141c760e455f12af2c.css" rel="stylesheet" id="quarto-text-highlighting-styles">
<script src="../site_libs/bootstrap/bootstrap.min.js"></script>
<link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
<link href="../site_libs/bootstrap/bootstrap-35ef2ff98a2131eb4c49a687ae04ea22.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="dark">
<link href="../site_libs/bootstrap/bootstrap-f15b14cef494beb09422a8174b542cad.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="dark">
<script id="quarto-search-options" type="application/json">{
"location": "navbar",
"copy-button": false,
@@ -142,7 +144,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" role="button" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
<i class="bi bi-layout-text-sidebar-reverse"></i>
</button>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../docs/fsdp_qlora.html">Advanced Features</a></li><li class="breadcrumb-item"><a href="../docs/gradient_checkpointing.html">Gradient Checkpointing and Activation Offloading</a></li></ol></nav>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../docs/fsdp_qlora.html">Advanced Features</a></li><li class="breadcrumb-item"><a href="../docs/gradient_checkpointing.html">Gradient Checkpointing, Activation Offloading, and Layer Offloading</a></li></ol></nav>
<a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
</a>
</div>
@@ -698,7 +700,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../docs/gradient_checkpointing.html" class="sidebar-item-text sidebar-link active">
<span class="menu-text">Gradient Checkpointing and Activation Offloading</span></a>
<span class="menu-text">Gradient Checkpointing, Activation Offloading, and Layer Offloading</span></a>
</div>
</li>
<li class="sidebar-item">
@@ -756,15 +758,16 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<ul>
<li><a href="#enabling-gradient-checkpointing" id="toc-enabling-gradient-checkpointing" class="nav-link active" data-scroll-target="#enabling-gradient-checkpointing">Enabling Gradient Checkpointing</a></li>
<li><a href="#enabling-activation-offloading" id="toc-enabling-activation-offloading" class="nav-link" data-scroll-target="#enabling-activation-offloading">Enabling Activation Offloading</a></li>
<li><a href="#enabling-layer-offloading" id="toc-enabling-layer-offloading" class="nav-link" data-scroll-target="#enabling-layer-offloading">Enabling Layer Offloading</a></li>
</ul>
</nav>
</div>
<!-- main -->
<main class="content" id="quarto-document-content">
<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../docs/fsdp_qlora.html">Advanced Features</a></li><li class="breadcrumb-item"><a href="../docs/gradient_checkpointing.html">Gradient Checkpointing and Activation Offloading</a></li></ol></nav>
<header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../docs/fsdp_qlora.html">Advanced Features</a></li><li class="breadcrumb-item"><a href="../docs/gradient_checkpointing.html">Gradient Checkpointing, Activation Offloading, and Layer Offloading</a></li></ol></nav>
<div class="quarto-title">
<h1 class="title">Gradient Checkpointing and Activation Offloading</h1>
<h1 class="title">Gradient Checkpointing, Activation Offloading, and Layer Offloading</h1>
</div>
@@ -797,6 +800,31 @@ to overlap the communications and computations when offloading.</p>
<p>The <code>activation_offloading: legacy</code> naively offloads activations to CPU and without additional optimizations.</p>
<p>For resource constrained environments with limited CPU memory, <code>activation_offloading: disk</code> offloads
activations to disk instead of CPU RAM so that much larger context lengths can be trained with minimal memory.</p>
</section>
<section id="enabling-layer-offloading" class="level3">
<h3 class="anchored" data-anchor-id="enabling-layer-offloading">Enabling Layer Offloading</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">layer_offloading</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Layer offloading reduces GPU memory usage by moving frozen (non-trainable) decoder layer parameters to CPU
and streaming them back to GPU one layer at a time during the forward and backward passes. This is
particularly useful for LoRA/QLoRA training where most of the models parameters are frozen — only the
trainable adapter weights stay on GPU permanently.</p>
<p>During training, forward and backward hooks on each decoder layer handle the transfer automatically:</p>
<ul>
<li><strong>Forward pass:</strong> Before a layer executes, its frozen params are loaded to GPU. The next layer is
prefetched asynchronously on a separate CUDA stream for overlap.</li>
<li><strong>Backward pass:</strong> Same pattern in reverse — the current layers frozen params are loaded and the
previous layer is prefetched.</li>
</ul>
<p>After each layer finishes, its frozen params are offloaded back to CPU pinned memory.</p>
<p>This approach trades some CPU-GPU transfer overhead for significant GPU memory savings — the freed memory
is roughly equal to the size of all frozen parameters across all decoder layers, minus one layers worth
that is kept on GPU at any given time.</p>
<p><strong>Requirements:</strong></p>
<ul>
<li>CUDA GPU (CPU-only training is not supported for this feature)</li>
<li>Works with any HuggingFace model architecture that uses decoder layers (Llama, Mistral, Qwen, etc.)</li>
<li>Best combined with LoRA/QLoRA where most parameters are frozen</li>
</ul>
</section>