Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2025-05-21 15:22:54 +00:00
parent ccf6259c1b
commit c71c6fe545
9 changed files with 1946 additions and 2919 deletions

View File

@@ -1 +1 @@
3750d6c6
45aa1b5b

View File

@@ -478,7 +478,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<tbody>
<tr class="odd">
<td><a href="#axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer">AxolotlDPOTrainer</a></td>
<td>Extend the base DPOTrainer for axolotl helpers</td>
<td>Extend the base DPOTrainer for axolotl helpers.</td>
</tr>
</tbody>
</table>
@@ -490,7 +490,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> dataset_tags<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Extend the base DPOTrainer for axolotl helpers</p>
<p>Extend the base DPOTrainer for axolotl helpers.</p>
<section id="methods" class="level4">
<h4 class="anchored" data-anchor-id="methods">Methods</h4>
<table class="caption-top table">
@@ -502,33 +502,17 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
</thead>
<tbody>
<tr class="odd">
<td><a href="#axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop">evaluation_loop</a></td>
<td>Overriding built-in evaluation loop to store metrics for each batch.</td>
</tr>
<tr class="even">
<td><a href="#axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub">push_to_hub</a></td>
<td>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing the</td>
<td>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing</td>
</tr>
</tbody>
</table>
<section id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop" class="level5">
<h5 class="anchored" data-anchor-id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop">evaluation_loop</h5>
<div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop(</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> dataloader,</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> description,</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> prediction_loss_only<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> ignore_keys<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> metric_key_prefix<span class="op">=</span><span class="st">'eval'</span>,</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Overriding built-in evaluation loop to store metrics for each batch.
Prediction/evaluation loop, shared by <code>Trainer.evaluate()</code> and <code>Trainer.predict()</code>.</p>
<p>Works both with or without labels.</p>
</section>
<section id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub" class="level5">
<h5 class="anchored" data-anchor-id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub">push_to_hub</h5>
<div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub(<span class="op">*</span>args, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing the
model on the Hub. Please refer to <code>~transformers.Trainer.push_to_hub</code> for more details.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub(<span class="op">*</span>args, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing
the model on the Hub. Please refer to <code>~transformers.Trainer.push_to_hub</code>
for more details.</p>
</section>

View File

@@ -1,902 +0,0 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
<meta charset="utf-8">
<meta name="generator" content="quarto-1.7.31">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title>core.trainers.mixins.sequence_parallel Axolotl</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
ul.task-list li input[type="checkbox"] {
width: 0.8em;
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
vertical-align: middle;
}
/* CSS for syntax highlighting */
html { -webkit-text-size-adjust: 100%; }
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
}
pre.numberSource { margin-left: 3em; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
</style>
<script src="../../site_libs/quarto-nav/quarto-nav.js"></script>
<script src="../../site_libs/clipboard/clipboard.min.js"></script>
<script src="../../site_libs/quarto-search/autocomplete.umd.js"></script>
<script src="../../site_libs/quarto-search/fuse.min.js"></script>
<script src="../../site_libs/quarto-search/quarto-search.js"></script>
<meta name="quarto:offset" content="../../">
<link href="../../favicon.jpg" rel="icon" type="image/jpeg">
<script src="../../site_libs/quarto-html/quarto.js" type="module"></script>
<script src="../../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script>
<script src="../../site_libs/quarto-html/popper.min.js"></script>
<script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
<script src="../../site_libs/quarto-html/anchor.min.js"></script>
<link href="../../site_libs/quarto-html/tippy.css" rel="stylesheet">
<link href="../../site_libs/quarto-html/quarto-syntax-highlighting-dark-8ef56b68f8fa1e9d2ba328e99e439f80.css" rel="stylesheet" id="quarto-text-highlighting-styles">
<script src="../../site_libs/bootstrap/bootstrap.min.js"></script>
<link href="../../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
<link href="../../site_libs/bootstrap/bootstrap-2288ecdcbf81d2ab6432743cedd71d9a.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="dark">
<script id="quarto-search-options" type="application/json">{
"location": "navbar",
"copy-button": false,
"collapse-after": 3,
"panel-placement": "end",
"type": "overlay",
"limit": 50,
"keyboard-shortcut": [
"f",
"/",
"s"
],
"show-item-context": false,
"language": {
"search-no-results-text": "No results",
"search-matching-documents-text": "matching documents",
"search-copy-link-title": "Copy link to search",
"search-hide-matches-text": "Hide additional matches",
"search-more-match-text": "more match in this document",
"search-more-matches-text": "more matches in this document",
"search-clear-button-title": "Clear",
"search-text-placeholder": "",
"search-detached-cancel-button-title": "Cancel",
"search-submit-button-title": "Submit",
"search-label": "Search"
}
}</script>
<link rel="stylesheet" href="../../styles.css">
</head>
<body class="nav-sidebar docked nav-fixed quarto-light">
<div id="quarto-search-results"></div>
<header id="quarto-header" class="headroom fixed-top">
<nav class="navbar navbar-expand " data-bs-theme="dark">
<div class="navbar-container container-fluid">
<div class="navbar-brand-container mx-auto">
<a href="../../index.html" class="navbar-brand navbar-brand-logo">
<img src="../../image/axolotl_logo_digital_white.svg" alt="" class="navbar-logo">
</a>
</div>
<div class="quarto-navbar-tools tools-wide tools-end">
<a href="https://twitter.com/axolotl_ai" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-twitter"></i></a>
<a href="https://github.com/axolotl-ai-cloud/axolotl/" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
<a href="https://discord.gg/7m9sfhzaf3" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-discord"></i></a>
</div>
<div id="quarto-search" class="" title="Search"></div>
</div> <!-- /container-fluid -->
</nav>
<nav class="quarto-secondary-nav">
<div class="container-fluid d-flex">
<button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" role="button" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
<i class="bi bi-layout-text-sidebar-reverse"></i>
</button>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"></ol></nav>
<a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
</a>
</div>
</nav>
</header>
<!-- content -->
<div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar">
<!-- sidebar -->
<nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation docked overflow-auto">
<div class="sidebar-menu-container">
<ul class="list-unstyled mt-1">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../index.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Home</span></a>
</div>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true">
<span class="menu-text">Getting Started</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-1" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/getting-started.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Quickstart</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/installation.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Installation</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/inference.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Inference and Merging</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/cli.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Command Line Interface (CLI)</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/config.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Config Reference</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/api" class="sidebar-item-text sidebar-link">
<span class="menu-text">API Reference</span></a>
</div>
</li>
</ul>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a href="../../docs/dataset-formats/index.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Dataset Formats</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-2" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset-formats/pretraining.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Pre-training</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset-formats/inst_tune.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Instruction Tuning</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset-formats/conversation.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Conversation</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset-formats/stepwise_supervised.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Stepwise Supervised Format</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset-formats/template_free.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Template-Free</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset-formats/tokenized.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Custom Pre-Tokenized Dataset</span></a>
</div>
</li>
</ul>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true">
<span class="menu-text">Deployments</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-3" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/docker.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Docker</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/multi-gpu.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Multi-GPU</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/multi-node.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Multi Node</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/ray-integration.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Ray Train</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/amd_hpc.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">AMD GPUs on HPC Systems</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/mac.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Mac M-series</span></a>
</div>
</li>
</ul>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-4" role="navigation" aria-expanded="true">
<span class="menu-text">How To Guides</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-4" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-4" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/multimodal.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">MultiModal / Vision Language Models (BETA)</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/rlhf.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">RLHF (Beta)</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/reward_modelling.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Reward Modelling</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/lr_groups.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Learning Rate Groups</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/lora_optims.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">LoRA Optimizations</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset_loading.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Dataset Loading</span></a>
</div>
</li>
</ul>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-5" role="navigation" aria-expanded="true">
<span class="menu-text">Core Concepts</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-5" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-5" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/batch_vs_grad.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Batch size vs Gradient accumulation</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/dataset_preprocessing.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Dataset Preprocessing</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/multipack.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Multipack (Sample Packing)</span></a>
</div>
</li>
</ul>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-6" role="navigation" aria-expanded="true">
<span class="menu-text">Advanced Features</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-6" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-6" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/fsdp_qlora.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">FDSP + QLoRA</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/unsloth.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Unsloth</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/torchao.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">PyTorch ao</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/custom_integrations.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Custom Integrations</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/sequence_parallelism.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Sequence Parallelism</span></a>
</div>
</li>
</ul>
</li>
<li class="sidebar-item sidebar-item-section">
<div class="sidebar-item-container">
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-7" role="navigation" aria-expanded="true">
<span class="menu-text">Troubleshooting</span></a>
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-7" role="navigation" aria-expanded="true" aria-label="Toggle section">
<i class="bi bi-chevron-right ms-2"></i>
</a>
</div>
<ul id="quarto-sidebar-section-7" class="collapse list-unstyled sidebar-section depth1 show">
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/faq.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">FAQ</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/debugging.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">Debugging</span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="../../docs/nccl.html" class="sidebar-item-text sidebar-link">
<span class="menu-text">NCCL</span></a>
</div>
</li>
</ul>
</li>
</ul>
</div>
</nav>
<div id="quarto-sidebar-glass" class="quarto-sidebar-collapse-item" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item"></div>
<!-- margin-sidebar -->
<div id="quarto-margin-sidebar" class="sidebar margin-sidebar">
<nav id="TOC" role="doc-toc" class="toc-active">
<h2 id="toc-title">On this page</h2>
<ul>
<li><a href="#axolotl.core.trainers.mixins.sequence_parallel" id="toc-axolotl.core.trainers.mixins.sequence_parallel" class="nav-link active" data-scroll-target="#axolotl.core.trainers.mixins.sequence_parallel">core.trainers.mixins.sequence_parallel</a>
<ul class="collapse">
<li><a href="#classes" id="toc-classes" class="nav-link" data-scroll-target="#classes">Classes</a>
<ul class="collapse">
<li><a href="#axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin" id="toc-axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin" class="nav-link" data-scroll-target="#axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin">SequenceParallelMixin</a></li>
</ul></li>
</ul></li>
</ul>
</nav>
</div>
<!-- main -->
<main class="content" id="quarto-document-content"><header id="title-block-header" class="quarto-title-block"></header>
<section id="axolotl.core.trainers.mixins.sequence_parallel" class="level1">
<h1>core.trainers.mixins.sequence_parallel</h1>
<p><code>core.trainers.mixins.sequence_parallel</code></p>
<p>Module for Axolotl trainer sequence parallelism mixin</p>
<section id="classes" class="level2">
<h2 class="anchored" data-anchor-id="classes">Classes</h2>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><a href="#axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin">SequenceParallelMixin</a></td>
<td>Mixin class for sequence parallelism support in trainers.</td>
</tr>
</tbody>
</table>
<section id="axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin" class="level3">
<h3 class="anchored" data-anchor-id="axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin">SequenceParallelMixin</h3>
<div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>core.trainers.mixins.sequence_parallel.SequenceParallelMixin()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Mixin class for sequence parallelism support in trainers.</p>
<p>This mixin provides functionality for handling sequence parallelism,
specifically for creating appropriate data samplers.</p>
</section>
</section>
</section>
</main> <!-- /main -->
<script id="quarto-html-after-body" type="application/javascript">
window.document.addEventListener("DOMContentLoaded", function (event) {
const icon = "";
const anchorJS = new window.AnchorJS();
anchorJS.options = {
placement: 'right',
icon: icon
};
anchorJS.add('.anchored');
const isCodeAnnotation = (el) => {
for (const clz of el.classList) {
if (clz.startsWith('code-annotation-')) {
return true;
}
}
return false;
}
const onCopySuccess = function(e) {
// button target
const button = e.trigger;
// don't keep focus
button.blur();
// flash "checked"
button.classList.add('code-copy-button-checked');
var currentTitle = button.getAttribute("title");
button.setAttribute("title", "Copied!");
let tooltip;
if (window.bootstrap) {
button.setAttribute("data-bs-toggle", "tooltip");
button.setAttribute("data-bs-placement", "left");
button.setAttribute("data-bs-title", "Copied!");
tooltip = new bootstrap.Tooltip(button,
{ trigger: "manual",
customClass: "code-copy-button-tooltip",
offset: [0, -8]});
tooltip.show();
}
setTimeout(function() {
if (tooltip) {
tooltip.hide();
button.removeAttribute("data-bs-title");
button.removeAttribute("data-bs-toggle");
button.removeAttribute("data-bs-placement");
}
button.setAttribute("title", currentTitle);
button.classList.remove('code-copy-button-checked');
}, 1000);
// clear code selection
e.clearSelection();
}
const getTextToCopy = function(trigger) {
const codeEl = trigger.previousElementSibling.cloneNode(true);
for (const childEl of codeEl.children) {
if (isCodeAnnotation(childEl)) {
childEl.remove();
}
}
return codeEl.innerText;
}
const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
text: getTextToCopy
});
clipboard.on('success', onCopySuccess);
if (window.document.getElementById('quarto-embedded-source-code-modal')) {
const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', {
text: getTextToCopy,
container: window.document.getElementById('quarto-embedded-source-code-modal')
});
clipboardModal.on('success', onCopySuccess);
}
var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
var mailtoRegex = new RegExp(/^mailto:/);
var filterRegex = new RegExp("https:\/\/docs\.axolotl\.ai");
var isInternal = (href) => {
return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
}
// Inspect non-navigation links and adorn them if external
var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)');
for (var i=0; i<links.length; i++) {
const link = links[i];
if (!isInternal(link.href)) {
// undo the damage that might have been done by quarto-nav.js in the case of
// links that we want to consider external
if (link.dataset.originalHref !== undefined) {
link.href = link.dataset.originalHref;
}
}
}
function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {
const config = {
allowHTML: true,
maxWidth: 500,
delay: 100,
arrow: false,
appendTo: function(el) {
return el.parentElement;
},
interactive: true,
interactiveBorder: 10,
theme: 'quarto',
placement: 'bottom-start',
};
if (contentFn) {
config.content = contentFn;
}
if (onTriggerFn) {
config.onTrigger = onTriggerFn;
}
if (onUntriggerFn) {
config.onUntrigger = onUntriggerFn;
}
window.tippy(el, config);
}
const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
for (var i=0; i<noterefs.length; i++) {
const ref = noterefs[i];
tippyHover(ref, function() {
// use id or data attribute instead here
let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');
try { href = new URL(href).hash; } catch {}
const id = href.replace(/^#\/?/, "");
const note = window.document.getElementById(id);
if (note) {
return note.innerHTML;
} else {
return "";
}
});
}
const xrefs = window.document.querySelectorAll('a.quarto-xref');
const processXRef = (id, note) => {
// Strip column container classes
const stripColumnClz = (el) => {
el.classList.remove("page-full", "page-columns");
if (el.children) {
for (const child of el.children) {
stripColumnClz(child);
}
}
}
stripColumnClz(note)
if (id === null || id.startsWith('sec-')) {
// Special case sections, only their first couple elements
const container = document.createElement("div");
if (note.children && note.children.length > 2) {
container.appendChild(note.children[0].cloneNode(true));
for (let i = 1; i < note.children.length; i++) {
const child = note.children[i];
if (child.tagName === "P" && child.innerText === "") {
continue;
} else {
container.appendChild(child.cloneNode(true));
break;
}
}
if (window.Quarto?.typesetMath) {
window.Quarto.typesetMath(container);
}
return container.innerHTML
} else {
if (window.Quarto?.typesetMath) {
window.Quarto.typesetMath(note);
}
return note.innerHTML;
}
} else {
// Remove any anchor links if they are present
const anchorLink = note.querySelector('a.anchorjs-link');
if (anchorLink) {
anchorLink.remove();
}
if (window.Quarto?.typesetMath) {
window.Quarto.typesetMath(note);
}
if (note.classList.contains("callout")) {
return note.outerHTML;
} else {
return note.innerHTML;
}
}
}
for (var i=0; i<xrefs.length; i++) {
const xref = xrefs[i];
tippyHover(xref, undefined, function(instance) {
instance.disable();
let url = xref.getAttribute('href');
let hash = undefined;
if (url.startsWith('#')) {
hash = url;
} else {
try { hash = new URL(url).hash; } catch {}
}
if (hash) {
const id = hash.replace(/^#\/?/, "");
const note = window.document.getElementById(id);
if (note !== null) {
try {
const html = processXRef(id, note.cloneNode(true));
instance.setContent(html);
} finally {
instance.enable();
instance.show();
}
} else {
// See if we can fetch this
fetch(url.split('#')[0])
.then(res => res.text())
.then(html => {
const parser = new DOMParser();
const htmlDoc = parser.parseFromString(html, "text/html");
const note = htmlDoc.getElementById(id);
if (note !== null) {
const html = processXRef(id, note);
instance.setContent(html);
}
}).finally(() => {
instance.enable();
instance.show();
});
}
} else {
// See if we can fetch a full url (with no hash to target)
// This is a special case and we should probably do some content thinning / targeting
fetch(url)
.then(res => res.text())
.then(html => {
const parser = new DOMParser();
const htmlDoc = parser.parseFromString(html, "text/html");
const note = htmlDoc.querySelector('main.content');
if (note !== null) {
// This should only happen for chapter cross references
// (since there is no id in the URL)
// remove the first header
if (note.children.length > 0 && note.children[0].tagName === "HEADER") {
note.children[0].remove();
}
const html = processXRef(null, note);
instance.setContent(html);
}
}).finally(() => {
instance.enable();
instance.show();
});
}
}, function(instance) {
});
}
let selectedAnnoteEl;
const selectorForAnnotation = ( cell, annotation) => {
let cellAttr = 'data-code-cell="' + cell + '"';
let lineAttr = 'data-code-annotation="' + annotation + '"';
const selector = 'span[' + cellAttr + '][' + lineAttr + ']';
return selector;
}
const selectCodeLines = (annoteEl) => {
const doc = window.document;
const targetCell = annoteEl.getAttribute("data-target-cell");
const targetAnnotation = annoteEl.getAttribute("data-target-annotation");
const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));
const lines = annoteSpan.getAttribute("data-code-lines").split(",");
const lineIds = lines.map((line) => {
return targetCell + "-" + line;
})
let top = null;
let height = null;
let parent = null;
if (lineIds.length > 0) {
//compute the position of the single el (top and bottom and make a div)
const el = window.document.getElementById(lineIds[0]);
top = el.offsetTop;
height = el.offsetHeight;
parent = el.parentElement.parentElement;
if (lineIds.length > 1) {
const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);
const bottom = lastEl.offsetTop + lastEl.offsetHeight;
height = bottom - top;
}
if (top !== null && height !== null && parent !== null) {
// cook up a div (if necessary) and position it
let div = window.document.getElementById("code-annotation-line-highlight");
if (div === null) {
div = window.document.createElement("div");
div.setAttribute("id", "code-annotation-line-highlight");
div.style.position = 'absolute';
parent.appendChild(div);
}
div.style.top = top - 2 + "px";
div.style.height = height + 4 + "px";
div.style.left = 0;
let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
if (gutterDiv === null) {
gutterDiv = window.document.createElement("div");
gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");
gutterDiv.style.position = 'absolute';
const codeCell = window.document.getElementById(targetCell);
const gutter = codeCell.querySelector('.code-annotation-gutter');
gutter.appendChild(gutterDiv);
}
gutterDiv.style.top = top - 2 + "px";
gutterDiv.style.height = height + 4 + "px";
}
selectedAnnoteEl = annoteEl;
}
};
const unselectCodeLines = () => {
const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];
elementsIds.forEach((elId) => {
const div = window.document.getElementById(elId);
if (div) {
div.remove();
}
});
selectedAnnoteEl = undefined;
};
// Handle positioning of the toggle
window.addEventListener(
"resize",
throttle(() => {
elRect = undefined;
if (selectedAnnoteEl) {
selectCodeLines(selectedAnnoteEl);
}
}, 10)
);
function throttle(fn, ms) {
let throttle = false;
let timer;
return (...args) => {
if(!throttle) { // first call gets through
fn.apply(this, args);
throttle = true;
} else { // all the others get throttled
if(timer) clearTimeout(timer); // cancel #2
timer = setTimeout(() => {
fn.apply(this, args);
timer = throttle = false;
}, ms);
}
};
}
// Attach click handler to the DT
const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
for (const annoteDlNode of annoteDls) {
annoteDlNode.addEventListener('click', (event) => {
const clickedEl = event.target;
if (clickedEl !== selectedAnnoteEl) {
unselectCodeLines();
const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');
if (activeEl) {
activeEl.classList.remove('code-annotation-active');
}
selectCodeLines(clickedEl);
clickedEl.classList.add('code-annotation-active');
} else {
// Unselect the line
unselectCodeLines();
clickedEl.classList.remove('code-annotation-active');
}
});
}
const findCites = (el) => {
const parentEl = el.parentElement;
if (parentEl) {
const cites = parentEl.dataset.cites;
if (cites) {
return {
el,
cites: cites.split(' ')
};
} else {
return findCites(el.parentElement)
}
} else {
return undefined;
}
};
var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
for (var i=0; i<bibliorefs.length; i++) {
const ref = bibliorefs[i];
const citeInfo = findCites(ref);
if (citeInfo) {
tippyHover(citeInfo.el, function() {
var popup = window.document.createElement('div');
citeInfo.cites.forEach(function(cite) {
var citeDiv = window.document.createElement('div');
citeDiv.classList.add('hanging-indent');
citeDiv.classList.add('csl-entry');
var biblioDiv = window.document.getElementById('ref-' + cite);
if (biblioDiv) {
citeDiv.innerHTML = biblioDiv.innerHTML;
}
popup.appendChild(citeDiv);
});
return popup.innerHTML;
});
}
}
});
</script>
</div> <!-- /content -->
</body></html>

View File

@@ -559,14 +559,12 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-45"><a href="#cb1-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-52"><a href="#cb1-52" aria-hidden="true" tabindex="-1"></a> simpo_gamma<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-53"><a href="#cb1-53" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"></a> simpo_gamma<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>CPO config for CPO training</p>
</section>
<section id="axolotl.core.training_args.AxolotlKTOConfig" class="level3">
@@ -616,13 +614,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<span id="cb2-43"><a href="#cb2-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
<span id="cb2-44"><a href="#cb2-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-45"><a href="#cb2-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-46"><a href="#cb2-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
<span id="cb2-47"><a href="#cb2-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-48"><a href="#cb2-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-49"><a href="#cb2-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-50"><a href="#cb2-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-51"><a href="#cb2-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-52"><a href="#cb2-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb2-46"><a href="#cb2-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-47"><a href="#cb2-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-48"><a href="#cb2-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-49"><a href="#cb2-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb2-50"><a href="#cb2-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>KTO config for KTO training</p>
</section>
<section id="axolotl.core.training_args.AxolotlORPOConfig" class="level3">
@@ -672,13 +668,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-51"><a href="#cb3-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-52"><a href="#cb3-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>ORPO config for ORPO training</p>
</section>
<section id="axolotl.core.training_args.AxolotlPRMConfig" class="level3">
@@ -728,13 +722,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<span id="cb4-43"><a href="#cb4-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
<span id="cb4-44"><a href="#cb4-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-45"><a href="#cb4-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-46"><a href="#cb4-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
<span id="cb4-47"><a href="#cb4-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-48"><a href="#cb4-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-50"><a href="#cb4-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-51"><a href="#cb4-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-52"><a href="#cb4-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb4-46"><a href="#cb4-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-47"><a href="#cb4-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-48"><a href="#cb4-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb4-50"><a href="#cb4-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>PRM config for PRM training</p>
</section>
<section id="axolotl.core.training_args.AxolotlRewardConfig" class="level3">
@@ -784,13 +776,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Reward config for Reward training</p>
</section>
<section id="axolotl.core.training_args.AxolotlTrainingArguments" class="level3">
@@ -840,13 +830,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
<span id="cb6-43"><a href="#cb6-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
<span id="cb6-44"><a href="#cb6-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-45"><a href="#cb6-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-46"><a href="#cb6-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
<span id="cb6-47"><a href="#cb6-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-48"><a href="#cb6-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-49"><a href="#cb6-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-50"><a href="#cb6-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-51"><a href="#cb6-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-52"><a href="#cb6-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb6-46"><a href="#cb6-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-47"><a href="#cb6-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-48"><a href="#cb6-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-49"><a href="#cb6-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb6-50"><a href="#cb6-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Training arguments for Causal trainer</p>
<p>This code is duplicated due to HF TrainingArguments not setting output_dir with a
default value so it cant be used as a mixin.</p>
@@ -898,13 +886,11 @@ default value so it cant be used as a mixin.</p>
<span id="cb7-43"><a href="#cb7-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
<span id="cb7-44"><a href="#cb7-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-45"><a href="#cb7-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-46"><a href="#cb7-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
<span id="cb7-47"><a href="#cb7-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-48"><a href="#cb7-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-49"><a href="#cb7-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-50"><a href="#cb7-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-51"><a href="#cb7-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-52"><a href="#cb7-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb7-46"><a href="#cb7-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-47"><a href="#cb7-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-48"><a href="#cb7-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-49"><a href="#cb7-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
<span id="cb7-50"><a href="#cb7-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>Mixin class for the Axolotl training args.</p>

View File

@@ -629,10 +629,6 @@ ul.task-list li input[type="checkbox"] {
<td><a href="../../docs/api/core.trainers.mixins.scheduler.html#axolotl.core.trainers.mixins.scheduler">core.trainers.mixins.scheduler</a></td>
<td>Module for Axolotl trainer scheduler mixin</td>
</tr>
<tr class="even">
<td><a href="../../docs/api/core.trainers.mixins.sequence_parallel.html#axolotl.core.trainers.mixins.sequence_parallel">core.trainers.mixins.sequence_parallel</a></td>
<td>Module for Axolotl trainer sequence parallelism mixin</td>
</tr>
</tbody>
</table>
</section>

View File

@@ -572,15 +572,7 @@ Tip
<a href="https://github.com/zhuzilin/ring-flash-attention">ring-flash-attention</a> project. This
allows one to split up sequences across GPUs, which is useful in the event that a
single sequence causes OOM errors during model training.</p>
<p>First, install <code>ring-flash-attn</code>, recommended via <code>pip install axolotl[ring-flash-attn]</code>,
or from source with <code>pip install .[ring-flash-attn]</code>.</p>
<p>Your Axolotl YAML config should contain the following lines:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sequence_parallel_degree</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span><span class="co"> # Split each sequence into 4 parts, one per GPU</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Required with sequence parallelism</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but will make training faster.</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="fu">heads_k_stride</span><span class="kw">:</span><span class="at"> </span><span class="dv">1</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>See our <a href="../docs/sequence_parallelism.html">dedicated guide</a> for more details.</p>
<p>See our <a href="../docs/sequence_parallelism.html">dedicated guide</a> for more information.</p>
<section id="sec-fsdp-qlora" class="level3" data-number="4.1">
<h3 data-number="4.1" class="anchored" data-anchor-id="sec-fsdp-qlora"><span class="header-section-number">4.1</span> FSDP + QLoRA</h3>
<p>For combining FSDP with QLoRA, see our <a href="../docs/fsdp_qlora.html">dedicated guide</a>.</p>
@@ -1080,146 +1072,133 @@ or from source with <code>pip install .[ring-flash-attn]</code>.</p>
}
});
</script><div class="modal fade" id="quarto-embedded-source-code-modal" tabindex="-1" aria-labelledby="quarto-embedded-source-code-modal-label" aria-hidden="true"><div class="modal-dialog modal-dialog-scrollable"><div class="modal-content"><div class="modal-header"><h5 class="modal-title" id="quarto-embedded-source-code-modal-label">Source Code</h5><button class="btn-close" data-bs-dismiss="modal"></button></div><div class="modal-body"><div class="">
<div class="sourceCode" id="cb5" data-shortcodes="false"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="an">title:</span><span class="co"> "Multi-GPU"</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="an">format:</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="co"> html:</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="co"> toc: true</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="co"> toc-depth: 3</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="co"> number-sections: true</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="co"> code-tools: true</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="an">execute:</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="co"> enabled: false</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>This guide covers advanced training configurations for multi-GPU setups using Axolotl.</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="fu">## Overview {#sec-overview}</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>Axolotl supports several methods for multi-GPU training:</span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>DeepSpeed (recommended)</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP (Fully Sharded Data Parallel)</span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Sequence parallelism</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP + QLoRA</span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a><span class="fu">## DeepSpeed {#sec-deepspeed}</span></span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a>DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.</span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a><span class="fu">### Configuration {#sec-deepspeed-config}</span></span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a>Add to your YAML config:</span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a><span class="fu">deepspeed</span><span class="kw">:</span><span class="at"> deepspeed_configs/zero1.json</span></span>
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-36"><a href="#cb5-36" aria-hidden="true" tabindex="-1"></a><span class="fu">### Usage {#sec-deepspeed-usage}</span></span>
<span id="cb5-37"><a href="#cb5-37" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-38"><a href="#cb5-38" aria-hidden="true" tabindex="-1"></a><span class="in">```{.bash}</span></span>
<span id="cb5-39"><a href="#cb5-39" aria-hidden="true" tabindex="-1"></a><span class="co"># Fetch deepspeed configs (if not already present)</span></span>
<span id="cb5-40"><a href="#cb5-40" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch deepspeed_configs</span>
<span id="cb5-41"><a href="#cb5-41" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-42"><a href="#cb5-42" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via config</span></span>
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml</span>
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via cli</span></span>
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml <span class="at">--deepspeed</span> deepspeed_configs/zero1.json</span>
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a><span class="fu">### ZeRO Stages {#sec-zero-stages}</span></span>
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a>We provide default configurations for:</span>
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-53"><a href="#cb5-53" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 (<span class="in">`zero1.json`</span>)</span>
<span id="cb5-54"><a href="#cb5-54" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 with torch compile (<span class="in">`zero1_torch_compile.json`</span>)</span>
<span id="cb5-55"><a href="#cb5-55" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 2 (<span class="in">`zero2.json`</span>)</span>
<span id="cb5-56"><a href="#cb5-56" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 (<span class="in">`zero3.json`</span>)</span>
<span id="cb5-57"><a href="#cb5-57" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 (<span class="in">`zero3_bf16.json`</span>)</span>
<span id="cb5-58"><a href="#cb5-58" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params(<span class="in">`zero3_bf16_cpuoffload_params.json`</span>)</span>
<span id="cb5-59"><a href="#cb5-59" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params and optimizer (<span class="in">`zero3_bf16_cpuoffload_all.json`</span>)</span>
<span id="cb5-60"><a href="#cb5-60" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-61"><a href="#cb5-61" aria-hidden="true" tabindex="-1"></a>::: {.callout-tip}</span>
<span id="cb5-62"><a href="#cb5-62" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-63"><a href="#cb5-63" aria-hidden="true" tabindex="-1"></a>Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.</span>
<span id="cb5-64"><a href="#cb5-64" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-65"><a href="#cb5-65" aria-hidden="true" tabindex="-1"></a>Start from Stage 1 -&gt; Stage 2 -&gt; Stage 3.</span>
<span id="cb5-66"><a href="#cb5-66" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-67"><a href="#cb5-67" aria-hidden="true" tabindex="-1"></a>:::</span>
<span id="cb5-68"><a href="#cb5-68" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-69"><a href="#cb5-69" aria-hidden="true" tabindex="-1"></a><span class="fu">## FSDP {#sec-fsdp}</span></span>
<span id="cb5-70"><a href="#cb5-70" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-71"><a href="#cb5-71" aria-hidden="true" tabindex="-1"></a><span class="fu">### Basic FSDP Configuration {#sec-fsdp-config}</span></span>
<span id="cb5-72"><a href="#cb5-72" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-73"><a href="#cb5-73" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
<span id="cb5-74"><a href="#cb5-74" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
<span id="cb5-75"><a href="#cb5-75" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
<span id="cb5-76"><a href="#cb5-76" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
<span id="cb5-77"><a href="#cb5-77" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
<span id="cb5-78"><a href="#cb5-78" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb5-79"><a href="#cb5-79" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
<span id="cb5-80"><a href="#cb5-80" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
<span id="cb5-81"><a href="#cb5-81" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
<span id="cb5-82"><a href="#cb5-82" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-83"><a href="#cb5-83" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
<span id="cb5-84"><a href="#cb5-84" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-85"><a href="#cb5-85" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
<span id="cb5-86"><a href="#cb5-86" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
<span id="cb5-87"><a href="#cb5-87" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
<span id="cb5-88"><a href="#cb5-88" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
<span id="cb5-89"><a href="#cb5-89" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-90"><a href="#cb5-90" aria-hidden="true" tabindex="-1"></a>First, install <span class="in">`ring-flash-attn`</span>, recommended via <span class="in">`pip install axolotl[ring-flash-attn]`</span>,</span>
<span id="cb5-91"><a href="#cb5-91" aria-hidden="true" tabindex="-1"></a>or from source with <span class="in">`pip install .[ring-flash-attn]`</span>.</span>
<span id="cb5-92"><a href="#cb5-92" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-93"><a href="#cb5-93" aria-hidden="true" tabindex="-1"></a>Your Axolotl YAML config should contain the following lines:</span>
<span id="cb5-94"><a href="#cb5-94" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-95"><a href="#cb5-95" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
<span id="cb5-96"><a href="#cb5-96" aria-hidden="true" tabindex="-1"></a><span class="fu">sequence_parallel_degree</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span><span class="co"> # Split each sequence into 4 parts, one per GPU</span></span>
<span id="cb5-97"><a href="#cb5-97" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Required with sequence parallelism</span></span>
<span id="cb5-98"><a href="#cb5-98" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-99"><a href="#cb5-99" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but will make training faster.</span></span>
<span id="cb5-100"><a href="#cb5-100" aria-hidden="true" tabindex="-1"></a><span class="fu">heads_k_stride</span><span class="kw">:</span><span class="at"> </span><span class="dv">1</span></span>
<span id="cb5-101"><a href="#cb5-101" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
<span id="cb5-102"><a href="#cb5-102" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-103"><a href="#cb5-103" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more details.</span>
<span id="cb5-104"><a href="#cb5-104" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-105"><a href="#cb5-105" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
<span id="cb5-106"><a href="#cb5-106" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-107"><a href="#cb5-107" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
<span id="cb5-108"><a href="#cb5-108" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-109"><a href="#cb5-109" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
<span id="cb5-110"><a href="#cb5-110" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-111"><a href="#cb5-111" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
<span id="cb5-112"><a href="#cb5-112" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-113"><a href="#cb5-113" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
<span id="cb5-114"><a href="#cb5-114" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-115"><a href="#cb5-115" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
<span id="cb5-116"><a href="#cb5-116" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-117"><a href="#cb5-117" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
<span id="cb5-118"><a href="#cb5-118" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-119"><a href="#cb5-119" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
<span id="cb5-120"><a href="#cb5-120" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-121"><a href="#cb5-121" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
<span id="cb5-122"><a href="#cb5-122" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-123"><a href="#cb5-123" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
<span id="cb5-124"><a href="#cb5-124" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-125"><a href="#cb5-125" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
<span id="cb5-126"><a href="#cb5-126" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-127"><a href="#cb5-127" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
<span id="cb5-128"><a href="#cb5-128" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
<span id="cb5-129"><a href="#cb5-129" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
<span id="cb5-130"><a href="#cb5-130" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
<span id="cb5-131"><a href="#cb5-131" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-132"><a href="#cb5-132" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
<span id="cb5-133"><a href="#cb5-133" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-134"><a href="#cb5-134" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
<span id="cb5-135"><a href="#cb5-135" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
<span id="cb5-136"><a href="#cb5-136" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
<span id="cb5-137"><a href="#cb5-137" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-138"><a href="#cb5-138" aria-hidden="true" tabindex="-1"></a>:::</span>
<span id="cb5-139"><a href="#cb5-139" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-140"><a href="#cb5-140" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></pre></div>
<div class="sourceCode" id="cb4" data-shortcodes="false"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="an">title:</span><span class="co"> "Multi-GPU"</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="an">format:</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"> html:</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="co"> toc: true</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="co"> toc-depth: 3</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="co"> number-sections: true</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="co"> code-tools: true</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="an">execute:</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="co"> enabled: false</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>This guide covers advanced training configurations for multi-GPU setups using Axolotl.</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a><span class="fu">## Overview {#sec-overview}</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>Axolotl supports several methods for multi-GPU training:</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>DeepSpeed (recommended)</span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP (Fully Sharded Data Parallel)</span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Sequence parallelism</span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP + QLoRA</span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a><span class="fu">## DeepSpeed {#sec-deepspeed}</span></span>
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a>DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.</span>
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-28"><a href="#cb4-28" aria-hidden="true" tabindex="-1"></a><span class="fu">### Configuration {#sec-deepspeed-config}</span></span>
<span id="cb4-29"><a href="#cb4-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-30"><a href="#cb4-30" aria-hidden="true" tabindex="-1"></a>Add to your YAML config:</span>
<span id="cb4-31"><a href="#cb4-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-32"><a href="#cb4-32" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
<span id="cb4-33"><a href="#cb4-33" aria-hidden="true" tabindex="-1"></a><span class="fu">deepspeed</span><span class="kw">:</span><span class="at"> deepspeed_configs/zero1.json</span></span>
<span id="cb4-34"><a href="#cb4-34" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
<span id="cb4-35"><a href="#cb4-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-36"><a href="#cb4-36" aria-hidden="true" tabindex="-1"></a><span class="fu">### Usage {#sec-deepspeed-usage}</span></span>
<span id="cb4-37"><a href="#cb4-37" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-38"><a href="#cb4-38" aria-hidden="true" tabindex="-1"></a><span class="in">```{.bash}</span></span>
<span id="cb4-39"><a href="#cb4-39" aria-hidden="true" tabindex="-1"></a><span class="co"># Fetch deepspeed configs (if not already present)</span></span>
<span id="cb4-40"><a href="#cb4-40" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch deepspeed_configs</span>
<span id="cb4-41"><a href="#cb4-41" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-42"><a href="#cb4-42" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via config</span></span>
<span id="cb4-43"><a href="#cb4-43" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml</span>
<span id="cb4-44"><a href="#cb4-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-45"><a href="#cb4-45" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via cli</span></span>
<span id="cb4-46"><a href="#cb4-46" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml <span class="at">--deepspeed</span> deepspeed_configs/zero1.json</span>
<span id="cb4-47"><a href="#cb4-47" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
<span id="cb4-48"><a href="#cb4-48" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a><span class="fu">### ZeRO Stages {#sec-zero-stages}</span></span>
<span id="cb4-50"><a href="#cb4-50" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-51"><a href="#cb4-51" aria-hidden="true" tabindex="-1"></a>We provide default configurations for:</span>
<span id="cb4-52"><a href="#cb4-52" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-53"><a href="#cb4-53" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 (<span class="in">`zero1.json`</span>)</span>
<span id="cb4-54"><a href="#cb4-54" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 with torch compile (<span class="in">`zero1_torch_compile.json`</span>)</span>
<span id="cb4-55"><a href="#cb4-55" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 2 (<span class="in">`zero2.json`</span>)</span>
<span id="cb4-56"><a href="#cb4-56" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 (<span class="in">`zero3.json`</span>)</span>
<span id="cb4-57"><a href="#cb4-57" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 (<span class="in">`zero3_bf16.json`</span>)</span>
<span id="cb4-58"><a href="#cb4-58" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params(<span class="in">`zero3_bf16_cpuoffload_params.json`</span>)</span>
<span id="cb4-59"><a href="#cb4-59" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params and optimizer (<span class="in">`zero3_bf16_cpuoffload_all.json`</span>)</span>
<span id="cb4-60"><a href="#cb4-60" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-61"><a href="#cb4-61" aria-hidden="true" tabindex="-1"></a>::: {.callout-tip}</span>
<span id="cb4-62"><a href="#cb4-62" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-63"><a href="#cb4-63" aria-hidden="true" tabindex="-1"></a>Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.</span>
<span id="cb4-64"><a href="#cb4-64" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-65"><a href="#cb4-65" aria-hidden="true" tabindex="-1"></a>Start from Stage 1 -&gt; Stage 2 -&gt; Stage 3.</span>
<span id="cb4-66"><a href="#cb4-66" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-67"><a href="#cb4-67" aria-hidden="true" tabindex="-1"></a>:::</span>
<span id="cb4-68"><a href="#cb4-68" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-69"><a href="#cb4-69" aria-hidden="true" tabindex="-1"></a><span class="fu">## FSDP {#sec-fsdp}</span></span>
<span id="cb4-70"><a href="#cb4-70" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-71"><a href="#cb4-71" aria-hidden="true" tabindex="-1"></a><span class="fu">### Basic FSDP Configuration {#sec-fsdp-config}</span></span>
<span id="cb4-72"><a href="#cb4-72" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-73"><a href="#cb4-73" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
<span id="cb4-74"><a href="#cb4-74" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
<span id="cb4-75"><a href="#cb4-75" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
<span id="cb4-76"><a href="#cb4-76" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
<span id="cb4-77"><a href="#cb4-77" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
<span id="cb4-78"><a href="#cb4-78" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
<span id="cb4-79"><a href="#cb4-79" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
<span id="cb4-80"><a href="#cb4-80" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
<span id="cb4-81"><a href="#cb4-81" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
<span id="cb4-82"><a href="#cb4-82" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-83"><a href="#cb4-83" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
<span id="cb4-84"><a href="#cb4-84" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-85"><a href="#cb4-85" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
<span id="cb4-86"><a href="#cb4-86" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
<span id="cb4-87"><a href="#cb4-87" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
<span id="cb4-88"><a href="#cb4-88" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
<span id="cb4-89"><a href="#cb4-89" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-90"><a href="#cb4-90" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more information.</span>
<span id="cb4-91"><a href="#cb4-91" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-92"><a href="#cb4-92" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
<span id="cb4-93"><a href="#cb4-93" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-94"><a href="#cb4-94" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
<span id="cb4-95"><a href="#cb4-95" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-96"><a href="#cb4-96" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
<span id="cb4-97"><a href="#cb4-97" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-98"><a href="#cb4-98" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
<span id="cb4-99"><a href="#cb4-99" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-100"><a href="#cb4-100" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
<span id="cb4-101"><a href="#cb4-101" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-102"><a href="#cb4-102" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
<span id="cb4-103"><a href="#cb4-103" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-104"><a href="#cb4-104" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
<span id="cb4-105"><a href="#cb4-105" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-106"><a href="#cb4-106" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
<span id="cb4-107"><a href="#cb4-107" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-108"><a href="#cb4-108" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
<span id="cb4-109"><a href="#cb4-109" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-110"><a href="#cb4-110" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
<span id="cb4-111"><a href="#cb4-111" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-112"><a href="#cb4-112" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
<span id="cb4-113"><a href="#cb4-113" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-114"><a href="#cb4-114" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
<span id="cb4-115"><a href="#cb4-115" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
<span id="cb4-116"><a href="#cb4-116" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
<span id="cb4-117"><a href="#cb4-117" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
<span id="cb4-118"><a href="#cb4-118" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-119"><a href="#cb4-119" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
<span id="cb4-120"><a href="#cb4-120" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-121"><a href="#cb4-121" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
<span id="cb4-122"><a href="#cb4-122" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
<span id="cb4-123"><a href="#cb4-123" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
<span id="cb4-124"><a href="#cb4-124" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-125"><a href="#cb4-125" aria-hidden="true" tabindex="-1"></a>:::</span>
<span id="cb4-126"><a href="#cb4-126" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-127"><a href="#cb4-127" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></pre></div>
</div></div></div></div></div>
</div> <!-- /content -->

View File

@@ -520,7 +520,7 @@ through a ring communication pattern.</p>
<ol type="1">
<li>Each sequence is divided into equal chunks across the GPUs in a sequence parallel group</li>
<li>The data collator handles the chunking of input_ids, attention_mask, labels, and position_ids</li>
<li>Position IDs are adjusted to maintain proper relative positions, especially for packed sequences</li>
<li>Position IDs are adjusted to maintain proper relative positions</li>
<li>The trainer uses special ring communication patterns for attention operations</li>
</ol>
</section>
@@ -551,11 +551,13 @@ through a ring communication pattern.</p>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="co">...</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="co">sequence_parallel_degree: 4 # Split each sequence into 4 parts, one per GPU</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="co">flash_attention: true # Required with sequence parallelism</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but should make training faster.</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="co">heads_k_stride: 1</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="co">...</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but should make training faster.</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="co">heads_k_stride: 1</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; one of "varlen_llama3" or "batch_ring". Defaults to</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="co"># "varlen_llama3" when `sample_packing: true`, and "batch_ring" otherwise.</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="co">ring_attn_func:</span></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="co">...</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>This will train the Llama 3 8B model with 8K context length, with each sequence split
into 2 subsequences of length 4096 across 2 GPUs.</p>
</section>

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff