Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2026-03-22 13:17:42 +00:00
parent 61e0653994
commit 3c421e0170
247 changed files with 8025 additions and 10900 deletions

View File

@@ -2,13 +2,16 @@
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
<meta charset="utf-8">
<meta name="generator" content="quarto-1.8.27">
<meta name="generator" content="quarto-1.9.36">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>utils.schedulers Axolotl</title>
<style>
/* Default styles provided by pandoc.
** See https://pandoc.org/MANUAL.html#variables-for-html for config info.
*/
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
@@ -67,15 +70,14 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<link href="../../favicon.jpg" rel="icon" type="image/jpeg">
<script src="../../site_libs/quarto-html/quarto.js" type="module"></script>
<script src="../../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script>
<script src="../../site_libs/quarto-html/axe/axe-check.js" type="module"></script>
<script src="../../site_libs/quarto-html/popper.min.js"></script>
<script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
<script src="../../site_libs/quarto-html/anchor.min.js"></script>
<link href="../../site_libs/quarto-html/tippy.css" rel="stylesheet">
<link href="../../site_libs/quarto-html/quarto-syntax-highlighting-dark-4d9afe2b8d18ee9fa5d0d57b5ed4214d.css" rel="stylesheet" id="quarto-text-highlighting-styles">
<link href="../../site_libs/quarto-html/quarto-syntax-highlighting-dark-f418161beb48e0141c760e455f12af2c.css" rel="stylesheet" id="quarto-text-highlighting-styles">
<script src="../../site_libs/bootstrap/bootstrap.min.js"></script>
<link href="../../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
<link href="../../site_libs/bootstrap/bootstrap-35ef2ff98a2131eb4c49a687ae04ea22.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="dark">
<link href="../../site_libs/bootstrap/bootstrap-f15b14cef494beb09422a8174b542cad.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="dark">
<script id="quarto-search-options" type="application/json">{
"location": "navbar",
"copy-button": false,
@@ -698,7 +700,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/gradient_checkpointing.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Gradient Checkpointing and Activation Offloading</span></a>
<span class="menu-text">Gradient Checkpointing, Activation Offloading, and Layer Offloading</span></a>
</div>
</li>
<li class="sidebar-item">
@@ -828,17 +830,48 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> min_lr_scale<span class="op">=</span><span class="fl">0.001</span>,</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Wraps another scheduler to apply per-lora-restart learning rate warmups.</p>
<section id="methods" class="level4">
<h4 class="anchored" data-anchor-id="methods">Methods</h4>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><a href="#axolotl.utils.schedulers.JaggedLRRestartScheduler.load_state_dict">load_state_dict</a></td>
<td>Restore state, including inner_schedule.</td>
</tr>
<tr class="even">
<td><a href="#axolotl.utils.schedulers.JaggedLRRestartScheduler.state_dict">state_dict</a></td>
<td>Return serializable state, saving inner_schedule as its own state_dict.</td>
</tr>
</tbody>
</table>
<section id="axolotl.utils.schedulers.JaggedLRRestartScheduler.load_state_dict" class="level5">
<h5 class="anchored" data-anchor-id="axolotl.utils.schedulers.JaggedLRRestartScheduler.load_state_dict">load_state_dict</h5>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.JaggedLRRestartScheduler.load_state_dict(state_dict)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Restore state, including inner_schedule.</p>
</section>
<section id="axolotl.utils.schedulers.JaggedLRRestartScheduler.state_dict" class="level5">
<h5 class="anchored" data-anchor-id="axolotl.utils.schedulers.JaggedLRRestartScheduler.state_dict">state_dict</h5>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.JaggedLRRestartScheduler.state_dict()</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Return serializable state, saving inner_schedule as its own state_dict.</p>
</section>
</section>
</section>
<section id="axolotl.utils.schedulers.RexLR" class="level3">
<h3 class="anchored" data-anchor-id="axolotl.utils.schedulers.RexLR">RexLR</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.RexLR(</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> max_lr,</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> min_lr,</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> total_steps<span class="op">=</span><span class="dv">0</span>,</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> num_warmup_steps<span class="op">=</span><span class="dv">0</span>,</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> last_step<span class="op">=</span><span class="dv">0</span>,</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.RexLR(</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> max_lr,</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> min_lr,</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> total_steps<span class="op">=</span><span class="dv">0</span>,</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> num_warmup_steps<span class="op">=</span><span class="dv">0</span>,</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> last_step<span class="op">=</span><span class="dv">0</span>,</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Reflected Exponential (REX) learning rate scheduler.</p>
<ul>
<li>Original implementation: https://github.com/IvanVassi/REX_LR</li>
@@ -930,12 +963,12 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
</table>
<section id="axolotl.utils.schedulers.get_cosine_schedule_with_min_lr" class="level3">
<h3 class="anchored" data-anchor-id="axolotl.utils.schedulers.get_cosine_schedule_with_min_lr">get_cosine_schedule_with_min_lr</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.get_cosine_schedule_with_min_lr(</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> num_warmup_steps,</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> num_training_steps,</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> min_lr_ratio<span class="op">=</span><span class="fl">0.0</span>,</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.get_cosine_schedule_with_min_lr(</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> num_warmup_steps,</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> num_training_steps,</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> min_lr_ratio<span class="op">=</span><span class="fl">0.0</span>,</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<section id="create-a-learning-rate-schedule-which-has" class="level4 doc-section doc-section-create-a-learning-rate-schedule-which-has">
<h4 class="doc-section doc-section-create-a-learning-rate-schedule-which-has anchored" data-anchor-id="create-a-learning-rate-schedule-which-has">Create a learning rate schedule which has</h4>
<ul>
@@ -946,13 +979,13 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
</section>
<section id="axolotl.utils.schedulers.get_cosine_schedule_with_quadratic_warmup" class="level3">
<h3 class="anchored" data-anchor-id="axolotl.utils.schedulers.get_cosine_schedule_with_quadratic_warmup">get_cosine_schedule_with_quadratic_warmup</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.get_cosine_schedule_with_quadratic_warmup(</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> num_warmup_steps,</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> num_training_steps,</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> num_cycles<span class="op">=</span><span class="fl">0.5</span>,</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> last_epoch<span class="op">=-</span><span class="dv">1</span>,</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.get_cosine_schedule_with_quadratic_warmup(</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> num_warmup_steps,</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> num_training_steps,</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> num_cycles<span class="op">=</span><span class="fl">0.5</span>,</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> last_epoch<span class="op">=-</span><span class="dv">1</span>,</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Create a schedule with a learning rate that decreases following the values of the cosine function between the
initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the
initial lr set in the optimizer.</p>
@@ -1014,15 +1047,15 @@ initial lr set in the optimizer.</p>
</section>
<section id="axolotl.utils.schedulers.get_cosine_schedule_with_warmup_decay_constant" class="level3">
<h3 class="anchored" data-anchor-id="axolotl.utils.schedulers.get_cosine_schedule_with_warmup_decay_constant">get_cosine_schedule_with_warmup_decay_constant</h3>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.get_cosine_schedule_with_warmup_decay_constant(</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> num_warmup_steps,</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> num_training_steps,</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> constant_lr_ratio,</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> min_lr_ratio,</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> num_cycles<span class="op">=</span><span class="fl">0.5</span>,</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> last_epoch<span class="op">=-</span><span class="dv">1</span>,</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>utils.schedulers.get_cosine_schedule_with_warmup_decay_constant(</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> optimizer,</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> num_warmup_steps,</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> num_training_steps,</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> constant_lr_ratio,</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> min_lr_ratio,</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> num_cycles<span class="op">=</span><span class="fl">0.5</span>,</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> last_epoch<span class="op">=-</span><span class="dv">1</span>,</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
<p>Implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf)
Create a schedule with a learning rate that decreases following the values of the cosine function between the
initial lr set in the optimizer to min_lr_ratio until num_training_steps * constant_lr_ratio, after constant_rate returns constant value of min_rate