Built site for gh-pages
This commit is contained in:
@@ -478,7 +478,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer">AxolotlDPOTrainer</a></td>
|
||||
<td>Extend the base DPOTrainer for axolotl helpers</td>
|
||||
<td>Extend the base DPOTrainer for axolotl helpers.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
@@ -490,7 +490,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> dataset_tags<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span>
|
||||
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Extend the base DPOTrainer for axolotl helpers</p>
|
||||
<p>Extend the base DPOTrainer for axolotl helpers.</p>
|
||||
<section id="methods" class="level4">
|
||||
<h4 class="anchored" data-anchor-id="methods">Methods</h4>
|
||||
<table class="caption-top table">
|
||||
@@ -502,33 +502,17 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop">evaluation_loop</a></td>
|
||||
<td>Overriding built-in evaluation loop to store metrics for each batch.</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><a href="#axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub">push_to_hub</a></td>
|
||||
<td>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing the</td>
|
||||
<td>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<section id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop" class="level5">
|
||||
<h5 class="anchored" data-anchor-id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop">evaluation_loop</h5>
|
||||
<div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>core.trainers.dpo.trainer.AxolotlDPOTrainer.evaluation_loop(</span>
|
||||
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> dataloader,</span>
|
||||
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> description,</span>
|
||||
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> prediction_loss_only<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> ignore_keys<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> metric_key_prefix<span class="op">=</span><span class="st">'eval'</span>,</span>
|
||||
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Overriding built-in evaluation loop to store metrics for each batch.
|
||||
Prediction/evaluation loop, shared by <code>Trainer.evaluate()</code> and <code>Trainer.predict()</code>.</p>
|
||||
<p>Works both with or without labels.</p>
|
||||
</section>
|
||||
<section id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub" class="level5">
|
||||
<h5 class="anchored" data-anchor-id="axolotl.core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub">push_to_hub</h5>
|
||||
<div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub(<span class="op">*</span>args, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing the
|
||||
model on the Hub. Please refer to <code>~transformers.Trainer.push_to_hub</code> for more details.</p>
|
||||
<div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>core.trainers.dpo.trainer.AxolotlDPOTrainer.push_to_hub(<span class="op">*</span>args, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Overwrite the <code>push_to_hub</code> method in order to force-add the tags when pushing
|
||||
the model on the Hub. Please refer to <code>~transformers.Trainer.push_to_hub</code>
|
||||
for more details.</p>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
@@ -1,902 +0,0 @@
|
||||
<!DOCTYPE html>
|
||||
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>
|
||||
|
||||
<meta charset="utf-8">
|
||||
<meta name="generator" content="quarto-1.7.31">
|
||||
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
|
||||
|
||||
|
||||
<title>core.trainers.mixins.sequence_parallel – Axolotl</title>
|
||||
<style>
|
||||
code{white-space: pre-wrap;}
|
||||
span.smallcaps{font-variant: small-caps;}
|
||||
div.columns{display: flex; gap: min(4vw, 1.5em);}
|
||||
div.column{flex: auto; overflow-x: auto;}
|
||||
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
|
||||
ul.task-list{list-style: none;}
|
||||
ul.task-list li input[type="checkbox"] {
|
||||
width: 0.8em;
|
||||
margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */
|
||||
vertical-align: middle;
|
||||
}
|
||||
/* CSS for syntax highlighting */
|
||||
html { -webkit-text-size-adjust: 100%; }
|
||||
pre > code.sourceCode { white-space: pre; position: relative; }
|
||||
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
|
||||
pre > code.sourceCode > span:empty { height: 1.2em; }
|
||||
.sourceCode { overflow: visible; }
|
||||
code.sourceCode > span { color: inherit; text-decoration: inherit; }
|
||||
div.sourceCode { margin: 1em 0; }
|
||||
pre.sourceCode { margin: 0; }
|
||||
@media screen {
|
||||
div.sourceCode { overflow: auto; }
|
||||
}
|
||||
@media print {
|
||||
pre > code.sourceCode { white-space: pre-wrap; }
|
||||
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
|
||||
}
|
||||
pre.numberSource code
|
||||
{ counter-reset: source-line 0; }
|
||||
pre.numberSource code > span
|
||||
{ position: relative; left: -4em; counter-increment: source-line; }
|
||||
pre.numberSource code > span > a:first-child::before
|
||||
{ content: counter(source-line);
|
||||
position: relative; left: -1em; text-align: right; vertical-align: baseline;
|
||||
border: none; display: inline-block;
|
||||
-webkit-touch-callout: none; -webkit-user-select: none;
|
||||
-khtml-user-select: none; -moz-user-select: none;
|
||||
-ms-user-select: none; user-select: none;
|
||||
padding: 0 4px; width: 4em;
|
||||
}
|
||||
pre.numberSource { margin-left: 3em; padding-left: 4px; }
|
||||
div.sourceCode
|
||||
{ }
|
||||
@media screen {
|
||||
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
|
||||
}
|
||||
</style>
|
||||
|
||||
|
||||
<script src="../../site_libs/quarto-nav/quarto-nav.js"></script>
|
||||
<script src="../../site_libs/clipboard/clipboard.min.js"></script>
|
||||
<script src="../../site_libs/quarto-search/autocomplete.umd.js"></script>
|
||||
<script src="../../site_libs/quarto-search/fuse.min.js"></script>
|
||||
<script src="../../site_libs/quarto-search/quarto-search.js"></script>
|
||||
<meta name="quarto:offset" content="../../">
|
||||
<link href="../../favicon.jpg" rel="icon" type="image/jpeg">
|
||||
<script src="../../site_libs/quarto-html/quarto.js" type="module"></script>
|
||||
<script src="../../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script>
|
||||
<script src="../../site_libs/quarto-html/popper.min.js"></script>
|
||||
<script src="../../site_libs/quarto-html/tippy.umd.min.js"></script>
|
||||
<script src="../../site_libs/quarto-html/anchor.min.js"></script>
|
||||
<link href="../../site_libs/quarto-html/tippy.css" rel="stylesheet">
|
||||
<link href="../../site_libs/quarto-html/quarto-syntax-highlighting-dark-8ef56b68f8fa1e9d2ba328e99e439f80.css" rel="stylesheet" id="quarto-text-highlighting-styles">
|
||||
<script src="../../site_libs/bootstrap/bootstrap.min.js"></script>
|
||||
<link href="../../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet">
|
||||
<link href="../../site_libs/bootstrap/bootstrap-2288ecdcbf81d2ab6432743cedd71d9a.min.css" rel="stylesheet" append-hash="true" id="quarto-bootstrap" data-mode="dark">
|
||||
<script id="quarto-search-options" type="application/json">{
|
||||
"location": "navbar",
|
||||
"copy-button": false,
|
||||
"collapse-after": 3,
|
||||
"panel-placement": "end",
|
||||
"type": "overlay",
|
||||
"limit": 50,
|
||||
"keyboard-shortcut": [
|
||||
"f",
|
||||
"/",
|
||||
"s"
|
||||
],
|
||||
"show-item-context": false,
|
||||
"language": {
|
||||
"search-no-results-text": "No results",
|
||||
"search-matching-documents-text": "matching documents",
|
||||
"search-copy-link-title": "Copy link to search",
|
||||
"search-hide-matches-text": "Hide additional matches",
|
||||
"search-more-match-text": "more match in this document",
|
||||
"search-more-matches-text": "more matches in this document",
|
||||
"search-clear-button-title": "Clear",
|
||||
"search-text-placeholder": "",
|
||||
"search-detached-cancel-button-title": "Cancel",
|
||||
"search-submit-button-title": "Submit",
|
||||
"search-label": "Search"
|
||||
}
|
||||
}</script>
|
||||
|
||||
|
||||
<link rel="stylesheet" href="../../styles.css">
|
||||
</head>
|
||||
|
||||
<body class="nav-sidebar docked nav-fixed quarto-light">
|
||||
|
||||
<div id="quarto-search-results"></div>
|
||||
<header id="quarto-header" class="headroom fixed-top">
|
||||
<nav class="navbar navbar-expand " data-bs-theme="dark">
|
||||
<div class="navbar-container container-fluid">
|
||||
<div class="navbar-brand-container mx-auto">
|
||||
<a href="../../index.html" class="navbar-brand navbar-brand-logo">
|
||||
<img src="../../image/axolotl_logo_digital_white.svg" alt="" class="navbar-logo">
|
||||
</a>
|
||||
</div>
|
||||
<div class="quarto-navbar-tools tools-wide tools-end">
|
||||
<a href="https://twitter.com/axolotl_ai" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-twitter"></i></a>
|
||||
<a href="https://github.com/axolotl-ai-cloud/axolotl/" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-github"></i></a>
|
||||
<a href="https://discord.gg/7m9sfhzaf3" title="" class="quarto-navigation-tool px-1" aria-label=""><i class="bi bi-discord"></i></a>
|
||||
</div>
|
||||
<div id="quarto-search" class="" title="Search"></div>
|
||||
</div> <!-- /container-fluid -->
|
||||
</nav>
|
||||
<nav class="quarto-secondary-nav">
|
||||
<div class="container-fluid d-flex">
|
||||
<button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" role="button" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
|
||||
<i class="bi bi-layout-text-sidebar-reverse"></i>
|
||||
</button>
|
||||
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"></ol></nav>
|
||||
<a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
|
||||
</a>
|
||||
</div>
|
||||
</nav>
|
||||
</header>
|
||||
<!-- content -->
|
||||
<div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar">
|
||||
<!-- sidebar -->
|
||||
<nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation docked overflow-auto">
|
||||
<div class="sidebar-menu-container">
|
||||
<ul class="list-unstyled mt-1">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../index.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Home</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true">
|
||||
<span class="menu-text">Getting Started</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-1" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/getting-started.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Quickstart</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/installation.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Installation</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/inference.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Inference and Merging</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/cli.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Command Line Interface (CLI)</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/config.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Config Reference</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/api" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">API Reference</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset-formats/index.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Dataset Formats</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-2" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset-formats/pretraining.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Pre-training</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset-formats/inst_tune.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Instruction Tuning</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset-formats/conversation.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Conversation</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset-formats/stepwise_supervised.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Stepwise Supervised Format</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset-formats/template_free.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Template-Free</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset-formats/tokenized.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Custom Pre-Tokenized Dataset</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true">
|
||||
<span class="menu-text">Deployments</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-3" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/docker.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Docker</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/multi-gpu.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Multi-GPU</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/multi-node.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Multi Node</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/ray-integration.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Ray Train</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/amd_hpc.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">AMD GPUs on HPC Systems</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/mac.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Mac M-series</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-4" role="navigation" aria-expanded="true">
|
||||
<span class="menu-text">How To Guides</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-4" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-4" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/multimodal.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">MultiModal / Vision Language Models (BETA)</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/rlhf.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">RLHF (Beta)</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/reward_modelling.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Reward Modelling</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/lr_groups.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Learning Rate Groups</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/lora_optims.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">LoRA Optimizations</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset_loading.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Dataset Loading</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-5" role="navigation" aria-expanded="true">
|
||||
<span class="menu-text">Core Concepts</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-5" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-5" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/batch_vs_grad.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Batch size vs Gradient accumulation</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/dataset_preprocessing.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Dataset Preprocessing</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/multipack.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Multipack (Sample Packing)</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-6" role="navigation" aria-expanded="true">
|
||||
<span class="menu-text">Advanced Features</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-6" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-6" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/fsdp_qlora.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">FDSP + QLoRA</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/unsloth.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Unsloth</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/torchao.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">PyTorch ao</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/custom_integrations.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Custom Integrations</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/sequence_parallelism.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Sequence Parallelism</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="sidebar-item sidebar-item-section">
|
||||
<div class="sidebar-item-container">
|
||||
<a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-7" role="navigation" aria-expanded="true">
|
||||
<span class="menu-text">Troubleshooting</span></a>
|
||||
<a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-7" role="navigation" aria-expanded="true" aria-label="Toggle section">
|
||||
<i class="bi bi-chevron-right ms-2"></i>
|
||||
</a>
|
||||
</div>
|
||||
<ul id="quarto-sidebar-section-7" class="collapse list-unstyled sidebar-section depth1 show">
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/faq.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">FAQ</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/debugging.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">Debugging</span></a>
|
||||
</div>
|
||||
</li>
|
||||
<li class="sidebar-item">
|
||||
<div class="sidebar-item-container">
|
||||
<a href="../../docs/nccl.html" class="sidebar-item-text sidebar-link">
|
||||
<span class="menu-text">NCCL</span></a>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</nav>
|
||||
<div id="quarto-sidebar-glass" class="quarto-sidebar-collapse-item" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item"></div>
|
||||
<!-- margin-sidebar -->
|
||||
<div id="quarto-margin-sidebar" class="sidebar margin-sidebar">
|
||||
<nav id="TOC" role="doc-toc" class="toc-active">
|
||||
<h2 id="toc-title">On this page</h2>
|
||||
|
||||
<ul>
|
||||
<li><a href="#axolotl.core.trainers.mixins.sequence_parallel" id="toc-axolotl.core.trainers.mixins.sequence_parallel" class="nav-link active" data-scroll-target="#axolotl.core.trainers.mixins.sequence_parallel">core.trainers.mixins.sequence_parallel</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#classes" id="toc-classes" class="nav-link" data-scroll-target="#classes">Classes</a>
|
||||
<ul class="collapse">
|
||||
<li><a href="#axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin" id="toc-axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin" class="nav-link" data-scroll-target="#axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin">SequenceParallelMixin</a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
</nav>
|
||||
</div>
|
||||
<!-- main -->
|
||||
<main class="content" id="quarto-document-content"><header id="title-block-header" class="quarto-title-block"></header>
|
||||
|
||||
|
||||
|
||||
|
||||
<section id="axolotl.core.trainers.mixins.sequence_parallel" class="level1">
|
||||
<h1>core.trainers.mixins.sequence_parallel</h1>
|
||||
<p><code>core.trainers.mixins.sequence_parallel</code></p>
|
||||
<p>Module for Axolotl trainer sequence parallelism mixin</p>
|
||||
<section id="classes" class="level2">
|
||||
<h2 class="anchored" data-anchor-id="classes">Classes</h2>
|
||||
<table class="caption-top table">
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th>Name</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="odd">
|
||||
<td><a href="#axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin">SequenceParallelMixin</a></td>
|
||||
<td>Mixin class for sequence parallelism support in trainers.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<section id="axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="axolotl.core.trainers.mixins.sequence_parallel.SequenceParallelMixin">SequenceParallelMixin</h3>
|
||||
<div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>core.trainers.mixins.sequence_parallel.SequenceParallelMixin()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Mixin class for sequence parallelism support in trainers.</p>
|
||||
<p>This mixin provides functionality for handling sequence parallelism,
|
||||
specifically for creating appropriate data samplers.</p>
|
||||
|
||||
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</main> <!-- /main -->
|
||||
<script id="quarto-html-after-body" type="application/javascript">
|
||||
window.document.addEventListener("DOMContentLoaded", function (event) {
|
||||
const icon = "";
|
||||
const anchorJS = new window.AnchorJS();
|
||||
anchorJS.options = {
|
||||
placement: 'right',
|
||||
icon: icon
|
||||
};
|
||||
anchorJS.add('.anchored');
|
||||
const isCodeAnnotation = (el) => {
|
||||
for (const clz of el.classList) {
|
||||
if (clz.startsWith('code-annotation-')) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
const onCopySuccess = function(e) {
|
||||
// button target
|
||||
const button = e.trigger;
|
||||
// don't keep focus
|
||||
button.blur();
|
||||
// flash "checked"
|
||||
button.classList.add('code-copy-button-checked');
|
||||
var currentTitle = button.getAttribute("title");
|
||||
button.setAttribute("title", "Copied!");
|
||||
let tooltip;
|
||||
if (window.bootstrap) {
|
||||
button.setAttribute("data-bs-toggle", "tooltip");
|
||||
button.setAttribute("data-bs-placement", "left");
|
||||
button.setAttribute("data-bs-title", "Copied!");
|
||||
tooltip = new bootstrap.Tooltip(button,
|
||||
{ trigger: "manual",
|
||||
customClass: "code-copy-button-tooltip",
|
||||
offset: [0, -8]});
|
||||
tooltip.show();
|
||||
}
|
||||
setTimeout(function() {
|
||||
if (tooltip) {
|
||||
tooltip.hide();
|
||||
button.removeAttribute("data-bs-title");
|
||||
button.removeAttribute("data-bs-toggle");
|
||||
button.removeAttribute("data-bs-placement");
|
||||
}
|
||||
button.setAttribute("title", currentTitle);
|
||||
button.classList.remove('code-copy-button-checked');
|
||||
}, 1000);
|
||||
// clear code selection
|
||||
e.clearSelection();
|
||||
}
|
||||
const getTextToCopy = function(trigger) {
|
||||
const codeEl = trigger.previousElementSibling.cloneNode(true);
|
||||
for (const childEl of codeEl.children) {
|
||||
if (isCodeAnnotation(childEl)) {
|
||||
childEl.remove();
|
||||
}
|
||||
}
|
||||
return codeEl.innerText;
|
||||
}
|
||||
const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', {
|
||||
text: getTextToCopy
|
||||
});
|
||||
clipboard.on('success', onCopySuccess);
|
||||
if (window.document.getElementById('quarto-embedded-source-code-modal')) {
|
||||
const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', {
|
||||
text: getTextToCopy,
|
||||
container: window.document.getElementById('quarto-embedded-source-code-modal')
|
||||
});
|
||||
clipboardModal.on('success', onCopySuccess);
|
||||
}
|
||||
var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
|
||||
var mailtoRegex = new RegExp(/^mailto:/);
|
||||
var filterRegex = new RegExp("https:\/\/docs\.axolotl\.ai");
|
||||
var isInternal = (href) => {
|
||||
return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
|
||||
}
|
||||
// Inspect non-navigation links and adorn them if external
|
||||
var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)');
|
||||
for (var i=0; i<links.length; i++) {
|
||||
const link = links[i];
|
||||
if (!isInternal(link.href)) {
|
||||
// undo the damage that might have been done by quarto-nav.js in the case of
|
||||
// links that we want to consider external
|
||||
if (link.dataset.originalHref !== undefined) {
|
||||
link.href = link.dataset.originalHref;
|
||||
}
|
||||
}
|
||||
}
|
||||
function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {
|
||||
const config = {
|
||||
allowHTML: true,
|
||||
maxWidth: 500,
|
||||
delay: 100,
|
||||
arrow: false,
|
||||
appendTo: function(el) {
|
||||
return el.parentElement;
|
||||
},
|
||||
interactive: true,
|
||||
interactiveBorder: 10,
|
||||
theme: 'quarto',
|
||||
placement: 'bottom-start',
|
||||
};
|
||||
if (contentFn) {
|
||||
config.content = contentFn;
|
||||
}
|
||||
if (onTriggerFn) {
|
||||
config.onTrigger = onTriggerFn;
|
||||
}
|
||||
if (onUntriggerFn) {
|
||||
config.onUntrigger = onUntriggerFn;
|
||||
}
|
||||
window.tippy(el, config);
|
||||
}
|
||||
const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]');
|
||||
for (var i=0; i<noterefs.length; i++) {
|
||||
const ref = noterefs[i];
|
||||
tippyHover(ref, function() {
|
||||
// use id or data attribute instead here
|
||||
let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href');
|
||||
try { href = new URL(href).hash; } catch {}
|
||||
const id = href.replace(/^#\/?/, "");
|
||||
const note = window.document.getElementById(id);
|
||||
if (note) {
|
||||
return note.innerHTML;
|
||||
} else {
|
||||
return "";
|
||||
}
|
||||
});
|
||||
}
|
||||
const xrefs = window.document.querySelectorAll('a.quarto-xref');
|
||||
const processXRef = (id, note) => {
|
||||
// Strip column container classes
|
||||
const stripColumnClz = (el) => {
|
||||
el.classList.remove("page-full", "page-columns");
|
||||
if (el.children) {
|
||||
for (const child of el.children) {
|
||||
stripColumnClz(child);
|
||||
}
|
||||
}
|
||||
}
|
||||
stripColumnClz(note)
|
||||
if (id === null || id.startsWith('sec-')) {
|
||||
// Special case sections, only their first couple elements
|
||||
const container = document.createElement("div");
|
||||
if (note.children && note.children.length > 2) {
|
||||
container.appendChild(note.children[0].cloneNode(true));
|
||||
for (let i = 1; i < note.children.length; i++) {
|
||||
const child = note.children[i];
|
||||
if (child.tagName === "P" && child.innerText === "") {
|
||||
continue;
|
||||
} else {
|
||||
container.appendChild(child.cloneNode(true));
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (window.Quarto?.typesetMath) {
|
||||
window.Quarto.typesetMath(container);
|
||||
}
|
||||
return container.innerHTML
|
||||
} else {
|
||||
if (window.Quarto?.typesetMath) {
|
||||
window.Quarto.typesetMath(note);
|
||||
}
|
||||
return note.innerHTML;
|
||||
}
|
||||
} else {
|
||||
// Remove any anchor links if they are present
|
||||
const anchorLink = note.querySelector('a.anchorjs-link');
|
||||
if (anchorLink) {
|
||||
anchorLink.remove();
|
||||
}
|
||||
if (window.Quarto?.typesetMath) {
|
||||
window.Quarto.typesetMath(note);
|
||||
}
|
||||
if (note.classList.contains("callout")) {
|
||||
return note.outerHTML;
|
||||
} else {
|
||||
return note.innerHTML;
|
||||
}
|
||||
}
|
||||
}
|
||||
for (var i=0; i<xrefs.length; i++) {
|
||||
const xref = xrefs[i];
|
||||
tippyHover(xref, undefined, function(instance) {
|
||||
instance.disable();
|
||||
let url = xref.getAttribute('href');
|
||||
let hash = undefined;
|
||||
if (url.startsWith('#')) {
|
||||
hash = url;
|
||||
} else {
|
||||
try { hash = new URL(url).hash; } catch {}
|
||||
}
|
||||
if (hash) {
|
||||
const id = hash.replace(/^#\/?/, "");
|
||||
const note = window.document.getElementById(id);
|
||||
if (note !== null) {
|
||||
try {
|
||||
const html = processXRef(id, note.cloneNode(true));
|
||||
instance.setContent(html);
|
||||
} finally {
|
||||
instance.enable();
|
||||
instance.show();
|
||||
}
|
||||
} else {
|
||||
// See if we can fetch this
|
||||
fetch(url.split('#')[0])
|
||||
.then(res => res.text())
|
||||
.then(html => {
|
||||
const parser = new DOMParser();
|
||||
const htmlDoc = parser.parseFromString(html, "text/html");
|
||||
const note = htmlDoc.getElementById(id);
|
||||
if (note !== null) {
|
||||
const html = processXRef(id, note);
|
||||
instance.setContent(html);
|
||||
}
|
||||
}).finally(() => {
|
||||
instance.enable();
|
||||
instance.show();
|
||||
});
|
||||
}
|
||||
} else {
|
||||
// See if we can fetch a full url (with no hash to target)
|
||||
// This is a special case and we should probably do some content thinning / targeting
|
||||
fetch(url)
|
||||
.then(res => res.text())
|
||||
.then(html => {
|
||||
const parser = new DOMParser();
|
||||
const htmlDoc = parser.parseFromString(html, "text/html");
|
||||
const note = htmlDoc.querySelector('main.content');
|
||||
if (note !== null) {
|
||||
// This should only happen for chapter cross references
|
||||
// (since there is no id in the URL)
|
||||
// remove the first header
|
||||
if (note.children.length > 0 && note.children[0].tagName === "HEADER") {
|
||||
note.children[0].remove();
|
||||
}
|
||||
const html = processXRef(null, note);
|
||||
instance.setContent(html);
|
||||
}
|
||||
}).finally(() => {
|
||||
instance.enable();
|
||||
instance.show();
|
||||
});
|
||||
}
|
||||
}, function(instance) {
|
||||
});
|
||||
}
|
||||
let selectedAnnoteEl;
|
||||
const selectorForAnnotation = ( cell, annotation) => {
|
||||
let cellAttr = 'data-code-cell="' + cell + '"';
|
||||
let lineAttr = 'data-code-annotation="' + annotation + '"';
|
||||
const selector = 'span[' + cellAttr + '][' + lineAttr + ']';
|
||||
return selector;
|
||||
}
|
||||
const selectCodeLines = (annoteEl) => {
|
||||
const doc = window.document;
|
||||
const targetCell = annoteEl.getAttribute("data-target-cell");
|
||||
const targetAnnotation = annoteEl.getAttribute("data-target-annotation");
|
||||
const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation));
|
||||
const lines = annoteSpan.getAttribute("data-code-lines").split(",");
|
||||
const lineIds = lines.map((line) => {
|
||||
return targetCell + "-" + line;
|
||||
})
|
||||
let top = null;
|
||||
let height = null;
|
||||
let parent = null;
|
||||
if (lineIds.length > 0) {
|
||||
//compute the position of the single el (top and bottom and make a div)
|
||||
const el = window.document.getElementById(lineIds[0]);
|
||||
top = el.offsetTop;
|
||||
height = el.offsetHeight;
|
||||
parent = el.parentElement.parentElement;
|
||||
if (lineIds.length > 1) {
|
||||
const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]);
|
||||
const bottom = lastEl.offsetTop + lastEl.offsetHeight;
|
||||
height = bottom - top;
|
||||
}
|
||||
if (top !== null && height !== null && parent !== null) {
|
||||
// cook up a div (if necessary) and position it
|
||||
let div = window.document.getElementById("code-annotation-line-highlight");
|
||||
if (div === null) {
|
||||
div = window.document.createElement("div");
|
||||
div.setAttribute("id", "code-annotation-line-highlight");
|
||||
div.style.position = 'absolute';
|
||||
parent.appendChild(div);
|
||||
}
|
||||
div.style.top = top - 2 + "px";
|
||||
div.style.height = height + 4 + "px";
|
||||
div.style.left = 0;
|
||||
let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter");
|
||||
if (gutterDiv === null) {
|
||||
gutterDiv = window.document.createElement("div");
|
||||
gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter");
|
||||
gutterDiv.style.position = 'absolute';
|
||||
const codeCell = window.document.getElementById(targetCell);
|
||||
const gutter = codeCell.querySelector('.code-annotation-gutter');
|
||||
gutter.appendChild(gutterDiv);
|
||||
}
|
||||
gutterDiv.style.top = top - 2 + "px";
|
||||
gutterDiv.style.height = height + 4 + "px";
|
||||
}
|
||||
selectedAnnoteEl = annoteEl;
|
||||
}
|
||||
};
|
||||
const unselectCodeLines = () => {
|
||||
const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"];
|
||||
elementsIds.forEach((elId) => {
|
||||
const div = window.document.getElementById(elId);
|
||||
if (div) {
|
||||
div.remove();
|
||||
}
|
||||
});
|
||||
selectedAnnoteEl = undefined;
|
||||
};
|
||||
// Handle positioning of the toggle
|
||||
window.addEventListener(
|
||||
"resize",
|
||||
throttle(() => {
|
||||
elRect = undefined;
|
||||
if (selectedAnnoteEl) {
|
||||
selectCodeLines(selectedAnnoteEl);
|
||||
}
|
||||
}, 10)
|
||||
);
|
||||
function throttle(fn, ms) {
|
||||
let throttle = false;
|
||||
let timer;
|
||||
return (...args) => {
|
||||
if(!throttle) { // first call gets through
|
||||
fn.apply(this, args);
|
||||
throttle = true;
|
||||
} else { // all the others get throttled
|
||||
if(timer) clearTimeout(timer); // cancel #2
|
||||
timer = setTimeout(() => {
|
||||
fn.apply(this, args);
|
||||
timer = throttle = false;
|
||||
}, ms);
|
||||
}
|
||||
};
|
||||
}
|
||||
// Attach click handler to the DT
|
||||
const annoteDls = window.document.querySelectorAll('dt[data-target-cell]');
|
||||
for (const annoteDlNode of annoteDls) {
|
||||
annoteDlNode.addEventListener('click', (event) => {
|
||||
const clickedEl = event.target;
|
||||
if (clickedEl !== selectedAnnoteEl) {
|
||||
unselectCodeLines();
|
||||
const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active');
|
||||
if (activeEl) {
|
||||
activeEl.classList.remove('code-annotation-active');
|
||||
}
|
||||
selectCodeLines(clickedEl);
|
||||
clickedEl.classList.add('code-annotation-active');
|
||||
} else {
|
||||
// Unselect the line
|
||||
unselectCodeLines();
|
||||
clickedEl.classList.remove('code-annotation-active');
|
||||
}
|
||||
});
|
||||
}
|
||||
const findCites = (el) => {
|
||||
const parentEl = el.parentElement;
|
||||
if (parentEl) {
|
||||
const cites = parentEl.dataset.cites;
|
||||
if (cites) {
|
||||
return {
|
||||
el,
|
||||
cites: cites.split(' ')
|
||||
};
|
||||
} else {
|
||||
return findCites(el.parentElement)
|
||||
}
|
||||
} else {
|
||||
return undefined;
|
||||
}
|
||||
};
|
||||
var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]');
|
||||
for (var i=0; i<bibliorefs.length; i++) {
|
||||
const ref = bibliorefs[i];
|
||||
const citeInfo = findCites(ref);
|
||||
if (citeInfo) {
|
||||
tippyHover(citeInfo.el, function() {
|
||||
var popup = window.document.createElement('div');
|
||||
citeInfo.cites.forEach(function(cite) {
|
||||
var citeDiv = window.document.createElement('div');
|
||||
citeDiv.classList.add('hanging-indent');
|
||||
citeDiv.classList.add('csl-entry');
|
||||
var biblioDiv = window.document.getElementById('ref-' + cite);
|
||||
if (biblioDiv) {
|
||||
citeDiv.innerHTML = biblioDiv.innerHTML;
|
||||
}
|
||||
popup.appendChild(citeDiv);
|
||||
});
|
||||
return popup.innerHTML;
|
||||
});
|
||||
}
|
||||
}
|
||||
});
|
||||
</script>
|
||||
</div> <!-- /content -->
|
||||
|
||||
|
||||
|
||||
|
||||
</body></html>
|
||||
@@ -559,14 +559,12 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
|
||||
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-45"><a href="#cb1-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
|
||||
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-52"><a href="#cb1-52" aria-hidden="true" tabindex="-1"></a> simpo_gamma<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-53"><a href="#cb1-53" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb1-46"><a href="#cb1-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-47"><a href="#cb1-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-48"><a href="#cb1-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-49"><a href="#cb1-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-50"><a href="#cb1-50" aria-hidden="true" tabindex="-1"></a> simpo_gamma<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb1-51"><a href="#cb1-51" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>CPO config for CPO training</p>
|
||||
</section>
|
||||
<section id="axolotl.core.training_args.AxolotlKTOConfig" class="level3">
|
||||
@@ -616,13 +614,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<span id="cb2-43"><a href="#cb2-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
|
||||
<span id="cb2-44"><a href="#cb2-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-45"><a href="#cb2-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-46"><a href="#cb2-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
|
||||
<span id="cb2-47"><a href="#cb2-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-48"><a href="#cb2-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-49"><a href="#cb2-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-50"><a href="#cb2-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-51"><a href="#cb2-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-52"><a href="#cb2-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb2-46"><a href="#cb2-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-47"><a href="#cb2-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-48"><a href="#cb2-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-49"><a href="#cb2-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb2-50"><a href="#cb2-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>KTO config for KTO training</p>
|
||||
</section>
|
||||
<section id="axolotl.core.training_args.AxolotlORPOConfig" class="level3">
|
||||
@@ -672,13 +668,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
|
||||
<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
|
||||
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-51"><a href="#cb3-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-52"><a href="#cb3-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>ORPO config for ORPO training</p>
|
||||
</section>
|
||||
<section id="axolotl.core.training_args.AxolotlPRMConfig" class="level3">
|
||||
@@ -728,13 +722,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<span id="cb4-43"><a href="#cb4-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
|
||||
<span id="cb4-44"><a href="#cb4-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-45"><a href="#cb4-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-46"><a href="#cb4-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
|
||||
<span id="cb4-47"><a href="#cb4-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-48"><a href="#cb4-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-50"><a href="#cb4-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-51"><a href="#cb4-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-52"><a href="#cb4-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb4-46"><a href="#cb4-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-47"><a href="#cb4-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-48"><a href="#cb4-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb4-50"><a href="#cb4-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>PRM config for PRM training</p>
|
||||
</section>
|
||||
<section id="axolotl.core.training_args.AxolotlRewardConfig" class="level3">
|
||||
@@ -784,13 +776,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
|
||||
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
|
||||
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Reward config for Reward training</p>
|
||||
</section>
|
||||
<section id="axolotl.core.training_args.AxolotlTrainingArguments" class="level3">
|
||||
@@ -840,13 +830,11 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
||||
<span id="cb6-43"><a href="#cb6-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
|
||||
<span id="cb6-44"><a href="#cb6-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-45"><a href="#cb6-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-46"><a href="#cb6-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
|
||||
<span id="cb6-47"><a href="#cb6-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-48"><a href="#cb6-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-49"><a href="#cb6-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-50"><a href="#cb6-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-51"><a href="#cb6-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-52"><a href="#cb6-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb6-46"><a href="#cb6-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-47"><a href="#cb6-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-48"><a href="#cb6-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-49"><a href="#cb6-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb6-50"><a href="#cb6-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Training arguments for Causal trainer</p>
|
||||
<p>This code is duplicated due to HF TrainingArguments not setting output_dir with a
|
||||
default value so it can’t be used as a mixin.</p>
|
||||
@@ -898,13 +886,11 @@ default value so it can’t be used as a mixin.</p>
|
||||
<span id="cb7-43"><a href="#cb7-43" aria-hidden="true" tabindex="-1"></a> kd_temperature<span class="op">=</span><span class="fl">1.0</span>,</span>
|
||||
<span id="cb7-44"><a href="#cb7-44" aria-hidden="true" tabindex="-1"></a> kd_zscore_base_temp<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-45"><a href="#cb7-45" aria-hidden="true" tabindex="-1"></a> kd_top_k_before_softmax<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-46"><a href="#cb7-46" aria-hidden="true" tabindex="-1"></a> sequence_parallel_degree<span class="op">=</span><span class="dv">1</span>,</span>
|
||||
<span id="cb7-47"><a href="#cb7-47" aria-hidden="true" tabindex="-1"></a> ring_attn_func<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-48"><a href="#cb7-48" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-49"><a href="#cb7-49" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-50"><a href="#cb7-50" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-51"><a href="#cb7-51" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-52"><a href="#cb7-52" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb7-46"><a href="#cb7-46" aria-hidden="true" tabindex="-1"></a> adam_beta3<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-47"><a href="#cb7-47" aria-hidden="true" tabindex="-1"></a> adam_epsilon2<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-48"><a href="#cb7-48" aria-hidden="true" tabindex="-1"></a> image_size<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-49"><a href="#cb7-49" aria-hidden="true" tabindex="-1"></a> image_resize_algorithm<span class="op">=</span><span class="va">None</span>,</span>
|
||||
<span id="cb7-50"><a href="#cb7-50" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>Mixin class for the Axolotl training args.</p>
|
||||
|
||||
|
||||
|
||||
@@ -629,10 +629,6 @@ ul.task-list li input[type="checkbox"] {
|
||||
<td><a href="../../docs/api/core.trainers.mixins.scheduler.html#axolotl.core.trainers.mixins.scheduler">core.trainers.mixins.scheduler</a></td>
|
||||
<td>Module for Axolotl trainer scheduler mixin</td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td><a href="../../docs/api/core.trainers.mixins.sequence_parallel.html#axolotl.core.trainers.mixins.sequence_parallel">core.trainers.mixins.sequence_parallel</a></td>
|
||||
<td>Module for Axolotl trainer sequence parallelism mixin</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
@@ -572,15 +572,7 @@ Tip
|
||||
<a href="https://github.com/zhuzilin/ring-flash-attention">ring-flash-attention</a> project. This
|
||||
allows one to split up sequences across GPUs, which is useful in the event that a
|
||||
single sequence causes OOM errors during model training.</p>
|
||||
<p>First, install <code>ring-flash-attn</code>, recommended via <code>pip install axolotl[ring-flash-attn]</code>,
|
||||
or from source with <code>pip install .[ring-flash-attn]</code>.</p>
|
||||
<p>Your Axolotl YAML config should contain the following lines:</p>
|
||||
<div class="sourceCode" id="cb4"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sequence_parallel_degree</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span><span class="co"> # Split each sequence into 4 parts, one per GPU</span></span>
|
||||
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Required with sequence parallelism</span></span>
|
||||
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but will make training faster.</span></span>
|
||||
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="fu">heads_k_stride</span><span class="kw">:</span><span class="at"> </span><span class="dv">1</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>See our <a href="../docs/sequence_parallelism.html">dedicated guide</a> for more details.</p>
|
||||
<p>See our <a href="../docs/sequence_parallelism.html">dedicated guide</a> for more information.</p>
|
||||
<section id="sec-fsdp-qlora" class="level3" data-number="4.1">
|
||||
<h3 data-number="4.1" class="anchored" data-anchor-id="sec-fsdp-qlora"><span class="header-section-number">4.1</span> FSDP + QLoRA</h3>
|
||||
<p>For combining FSDP with QLoRA, see our <a href="../docs/fsdp_qlora.html">dedicated guide</a>.</p>
|
||||
@@ -1080,146 +1072,133 @@ or from source with <code>pip install .[ring-flash-attn]</code>.</p>
|
||||
}
|
||||
});
|
||||
</script><div class="modal fade" id="quarto-embedded-source-code-modal" tabindex="-1" aria-labelledby="quarto-embedded-source-code-modal-label" aria-hidden="true"><div class="modal-dialog modal-dialog-scrollable"><div class="modal-content"><div class="modal-header"><h5 class="modal-title" id="quarto-embedded-source-code-modal-label">Source Code</h5><button class="btn-close" data-bs-dismiss="modal"></button></div><div class="modal-body"><div class="">
|
||||
<div class="sourceCode" id="cb5" data-shortcodes="false"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
|
||||
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="an">title:</span><span class="co"> "Multi-GPU"</span></span>
|
||||
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="an">format:</span></span>
|
||||
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="co"> html:</span></span>
|
||||
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="co"> toc: true</span></span>
|
||||
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="co"> toc-depth: 3</span></span>
|
||||
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="co"> number-sections: true</span></span>
|
||||
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="co"> code-tools: true</span></span>
|
||||
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="an">execute:</span></span>
|
||||
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="co"> enabled: false</span></span>
|
||||
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
|
||||
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>This guide covers advanced training configurations for multi-GPU setups using Axolotl.</span>
|
||||
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="fu">## Overview {#sec-overview}</span></span>
|
||||
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>Axolotl supports several methods for multi-GPU training:</span>
|
||||
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>DeepSpeed (recommended)</span>
|
||||
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP (Fully Sharded Data Parallel)</span>
|
||||
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Sequence parallelism</span>
|
||||
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP + QLoRA</span>
|
||||
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a><span class="fu">## DeepSpeed {#sec-deepspeed}</span></span>
|
||||
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a>DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.</span>
|
||||
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a><span class="fu">### Configuration {#sec-deepspeed-config}</span></span>
|
||||
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a>Add to your YAML config:</span>
|
||||
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a><span class="fu">deepspeed</span><span class="kw">:</span><span class="at"> deepspeed_configs/zero1.json</span></span>
|
||||
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-36"><a href="#cb5-36" aria-hidden="true" tabindex="-1"></a><span class="fu">### Usage {#sec-deepspeed-usage}</span></span>
|
||||
<span id="cb5-37"><a href="#cb5-37" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-38"><a href="#cb5-38" aria-hidden="true" tabindex="-1"></a><span class="in">```{.bash}</span></span>
|
||||
<span id="cb5-39"><a href="#cb5-39" aria-hidden="true" tabindex="-1"></a><span class="co"># Fetch deepspeed configs (if not already present)</span></span>
|
||||
<span id="cb5-40"><a href="#cb5-40" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch deepspeed_configs</span>
|
||||
<span id="cb5-41"><a href="#cb5-41" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-42"><a href="#cb5-42" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via config</span></span>
|
||||
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml</span>
|
||||
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via cli</span></span>
|
||||
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml <span class="at">--deepspeed</span> deepspeed_configs/zero1.json</span>
|
||||
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a><span class="fu">### ZeRO Stages {#sec-zero-stages}</span></span>
|
||||
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a>We provide default configurations for:</span>
|
||||
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-53"><a href="#cb5-53" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 (<span class="in">`zero1.json`</span>)</span>
|
||||
<span id="cb5-54"><a href="#cb5-54" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 with torch compile (<span class="in">`zero1_torch_compile.json`</span>)</span>
|
||||
<span id="cb5-55"><a href="#cb5-55" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 2 (<span class="in">`zero2.json`</span>)</span>
|
||||
<span id="cb5-56"><a href="#cb5-56" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 (<span class="in">`zero3.json`</span>)</span>
|
||||
<span id="cb5-57"><a href="#cb5-57" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 (<span class="in">`zero3_bf16.json`</span>)</span>
|
||||
<span id="cb5-58"><a href="#cb5-58" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params(<span class="in">`zero3_bf16_cpuoffload_params.json`</span>)</span>
|
||||
<span id="cb5-59"><a href="#cb5-59" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params and optimizer (<span class="in">`zero3_bf16_cpuoffload_all.json`</span>)</span>
|
||||
<span id="cb5-60"><a href="#cb5-60" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-61"><a href="#cb5-61" aria-hidden="true" tabindex="-1"></a>::: {.callout-tip}</span>
|
||||
<span id="cb5-62"><a href="#cb5-62" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-63"><a href="#cb5-63" aria-hidden="true" tabindex="-1"></a>Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.</span>
|
||||
<span id="cb5-64"><a href="#cb5-64" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-65"><a href="#cb5-65" aria-hidden="true" tabindex="-1"></a>Start from Stage 1 -> Stage 2 -> Stage 3.</span>
|
||||
<span id="cb5-66"><a href="#cb5-66" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-67"><a href="#cb5-67" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb5-68"><a href="#cb5-68" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-69"><a href="#cb5-69" aria-hidden="true" tabindex="-1"></a><span class="fu">## FSDP {#sec-fsdp}</span></span>
|
||||
<span id="cb5-70"><a href="#cb5-70" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-71"><a href="#cb5-71" aria-hidden="true" tabindex="-1"></a><span class="fu">### Basic FSDP Configuration {#sec-fsdp-config}</span></span>
|
||||
<span id="cb5-72"><a href="#cb5-72" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-73"><a href="#cb5-73" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb5-74"><a href="#cb5-74" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
||||
<span id="cb5-75"><a href="#cb5-75" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
|
||||
<span id="cb5-76"><a href="#cb5-76" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
|
||||
<span id="cb5-77"><a href="#cb5-77" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb5-78"><a href="#cb5-78" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb5-79"><a href="#cb5-79" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb5-80"><a href="#cb5-80" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
|
||||
<span id="cb5-81"><a href="#cb5-81" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb5-82"><a href="#cb5-82" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-83"><a href="#cb5-83" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
|
||||
<span id="cb5-84"><a href="#cb5-84" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-85"><a href="#cb5-85" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
|
||||
<span id="cb5-86"><a href="#cb5-86" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
|
||||
<span id="cb5-87"><a href="#cb5-87" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
|
||||
<span id="cb5-88"><a href="#cb5-88" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
|
||||
<span id="cb5-89"><a href="#cb5-89" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-90"><a href="#cb5-90" aria-hidden="true" tabindex="-1"></a>First, install <span class="in">`ring-flash-attn`</span>, recommended via <span class="in">`pip install axolotl[ring-flash-attn]`</span>,</span>
|
||||
<span id="cb5-91"><a href="#cb5-91" aria-hidden="true" tabindex="-1"></a>or from source with <span class="in">`pip install .[ring-flash-attn]`</span>.</span>
|
||||
<span id="cb5-92"><a href="#cb5-92" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-93"><a href="#cb5-93" aria-hidden="true" tabindex="-1"></a>Your Axolotl YAML config should contain the following lines:</span>
|
||||
<span id="cb5-94"><a href="#cb5-94" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-95"><a href="#cb5-95" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb5-96"><a href="#cb5-96" aria-hidden="true" tabindex="-1"></a><span class="fu">sequence_parallel_degree</span><span class="kw">:</span><span class="at"> </span><span class="dv">4</span><span class="co"> # Split each sequence into 4 parts, one per GPU</span></span>
|
||||
<span id="cb5-97"><a href="#cb5-97" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span><span class="co"> # Required with sequence parallelism</span></span>
|
||||
<span id="cb5-98"><a href="#cb5-98" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-99"><a href="#cb5-99" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but will make training faster.</span></span>
|
||||
<span id="cb5-100"><a href="#cb5-100" aria-hidden="true" tabindex="-1"></a><span class="fu">heads_k_stride</span><span class="kw">:</span><span class="at"> </span><span class="dv">1</span></span>
|
||||
<span id="cb5-101"><a href="#cb5-101" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb5-102"><a href="#cb5-102" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-103"><a href="#cb5-103" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more details.</span>
|
||||
<span id="cb5-104"><a href="#cb5-104" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-105"><a href="#cb5-105" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
|
||||
<span id="cb5-106"><a href="#cb5-106" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-107"><a href="#cb5-107" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
|
||||
<span id="cb5-108"><a href="#cb5-108" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-109"><a href="#cb5-109" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
|
||||
<span id="cb5-110"><a href="#cb5-110" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-111"><a href="#cb5-111" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
|
||||
<span id="cb5-112"><a href="#cb5-112" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-113"><a href="#cb5-113" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
|
||||
<span id="cb5-114"><a href="#cb5-114" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-115"><a href="#cb5-115" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
|
||||
<span id="cb5-116"><a href="#cb5-116" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-117"><a href="#cb5-117" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
|
||||
<span id="cb5-118"><a href="#cb5-118" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-119"><a href="#cb5-119" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
|
||||
<span id="cb5-120"><a href="#cb5-120" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-121"><a href="#cb5-121" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
|
||||
<span id="cb5-122"><a href="#cb5-122" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-123"><a href="#cb5-123" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
|
||||
<span id="cb5-124"><a href="#cb5-124" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-125"><a href="#cb5-125" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
|
||||
<span id="cb5-126"><a href="#cb5-126" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-127"><a href="#cb5-127" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
|
||||
<span id="cb5-128"><a href="#cb5-128" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
|
||||
<span id="cb5-129"><a href="#cb5-129" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
|
||||
<span id="cb5-130"><a href="#cb5-130" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
|
||||
<span id="cb5-131"><a href="#cb5-131" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-132"><a href="#cb5-132" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
|
||||
<span id="cb5-133"><a href="#cb5-133" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-134"><a href="#cb5-134" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
|
||||
<span id="cb5-135"><a href="#cb5-135" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
|
||||
<span id="cb5-136"><a href="#cb5-136" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
|
||||
<span id="cb5-137"><a href="#cb5-137" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-138"><a href="#cb5-138" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb5-139"><a href="#cb5-139" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb5-140"><a href="#cb5-140" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></pre></div>
|
||||
<div class="sourceCode" id="cb4" data-shortcodes="false"><pre class="sourceCode markdown code-with-copy"><code class="sourceCode markdown"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
|
||||
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="an">title:</span><span class="co"> "Multi-GPU"</span></span>
|
||||
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="an">format:</span></span>
|
||||
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"> html:</span></span>
|
||||
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="co"> toc: true</span></span>
|
||||
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="co"> toc-depth: 3</span></span>
|
||||
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="co"> number-sections: true</span></span>
|
||||
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="co"> code-tools: true</span></span>
|
||||
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="an">execute:</span></span>
|
||||
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="co"> enabled: false</span></span>
|
||||
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
|
||||
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>This guide covers advanced training configurations for multi-GPU setups using Axolotl.</span>
|
||||
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a><span class="fu">## Overview {#sec-overview}</span></span>
|
||||
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>Axolotl supports several methods for multi-GPU training:</span>
|
||||
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>DeepSpeed (recommended)</span>
|
||||
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP (Fully Sharded Data Parallel)</span>
|
||||
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Sequence parallelism</span>
|
||||
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>FSDP + QLoRA</span>
|
||||
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a><span class="fu">## DeepSpeed {#sec-deepspeed}</span></span>
|
||||
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a>DeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.</span>
|
||||
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-28"><a href="#cb4-28" aria-hidden="true" tabindex="-1"></a><span class="fu">### Configuration {#sec-deepspeed-config}</span></span>
|
||||
<span id="cb4-29"><a href="#cb4-29" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-30"><a href="#cb4-30" aria-hidden="true" tabindex="-1"></a>Add to your YAML config:</span>
|
||||
<span id="cb4-31"><a href="#cb4-31" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-32"><a href="#cb4-32" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb4-33"><a href="#cb4-33" aria-hidden="true" tabindex="-1"></a><span class="fu">deepspeed</span><span class="kw">:</span><span class="at"> deepspeed_configs/zero1.json</span></span>
|
||||
<span id="cb4-34"><a href="#cb4-34" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb4-35"><a href="#cb4-35" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-36"><a href="#cb4-36" aria-hidden="true" tabindex="-1"></a><span class="fu">### Usage {#sec-deepspeed-usage}</span></span>
|
||||
<span id="cb4-37"><a href="#cb4-37" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-38"><a href="#cb4-38" aria-hidden="true" tabindex="-1"></a><span class="in">```{.bash}</span></span>
|
||||
<span id="cb4-39"><a href="#cb4-39" aria-hidden="true" tabindex="-1"></a><span class="co"># Fetch deepspeed configs (if not already present)</span></span>
|
||||
<span id="cb4-40"><a href="#cb4-40" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> fetch deepspeed_configs</span>
|
||||
<span id="cb4-41"><a href="#cb4-41" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-42"><a href="#cb4-42" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via config</span></span>
|
||||
<span id="cb4-43"><a href="#cb4-43" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml</span>
|
||||
<span id="cb4-44"><a href="#cb4-44" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-45"><a href="#cb4-45" aria-hidden="true" tabindex="-1"></a><span class="co"># Passing arg via cli</span></span>
|
||||
<span id="cb4-46"><a href="#cb4-46" aria-hidden="true" tabindex="-1"></a><span class="ex">axolotl</span> train config.yml <span class="at">--deepspeed</span> deepspeed_configs/zero1.json</span>
|
||||
<span id="cb4-47"><a href="#cb4-47" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb4-48"><a href="#cb4-48" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-49"><a href="#cb4-49" aria-hidden="true" tabindex="-1"></a><span class="fu">### ZeRO Stages {#sec-zero-stages}</span></span>
|
||||
<span id="cb4-50"><a href="#cb4-50" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-51"><a href="#cb4-51" aria-hidden="true" tabindex="-1"></a>We provide default configurations for:</span>
|
||||
<span id="cb4-52"><a href="#cb4-52" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-53"><a href="#cb4-53" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 (<span class="in">`zero1.json`</span>)</span>
|
||||
<span id="cb4-54"><a href="#cb4-54" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 1 with torch compile (<span class="in">`zero1_torch_compile.json`</span>)</span>
|
||||
<span id="cb4-55"><a href="#cb4-55" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 2 (<span class="in">`zero2.json`</span>)</span>
|
||||
<span id="cb4-56"><a href="#cb4-56" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 (<span class="in">`zero3.json`</span>)</span>
|
||||
<span id="cb4-57"><a href="#cb4-57" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 (<span class="in">`zero3_bf16.json`</span>)</span>
|
||||
<span id="cb4-58"><a href="#cb4-58" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params(<span class="in">`zero3_bf16_cpuoffload_params.json`</span>)</span>
|
||||
<span id="cb4-59"><a href="#cb4-59" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>ZeRO Stage 3 with bf16 and CPU offload params and optimizer (<span class="in">`zero3_bf16_cpuoffload_all.json`</span>)</span>
|
||||
<span id="cb4-60"><a href="#cb4-60" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-61"><a href="#cb4-61" aria-hidden="true" tabindex="-1"></a>::: {.callout-tip}</span>
|
||||
<span id="cb4-62"><a href="#cb4-62" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-63"><a href="#cb4-63" aria-hidden="true" tabindex="-1"></a>Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.</span>
|
||||
<span id="cb4-64"><a href="#cb4-64" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-65"><a href="#cb4-65" aria-hidden="true" tabindex="-1"></a>Start from Stage 1 -> Stage 2 -> Stage 3.</span>
|
||||
<span id="cb4-66"><a href="#cb4-66" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-67"><a href="#cb4-67" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb4-68"><a href="#cb4-68" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-69"><a href="#cb4-69" aria-hidden="true" tabindex="-1"></a><span class="fu">## FSDP {#sec-fsdp}</span></span>
|
||||
<span id="cb4-70"><a href="#cb4-70" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-71"><a href="#cb4-71" aria-hidden="true" tabindex="-1"></a><span class="fu">### Basic FSDP Configuration {#sec-fsdp-config}</span></span>
|
||||
<span id="cb4-72"><a href="#cb4-72" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-73"><a href="#cb4-73" aria-hidden="true" tabindex="-1"></a><span class="in">```{.yaml}</span></span>
|
||||
<span id="cb4-74"><a href="#cb4-74" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp</span><span class="kw">:</span></span>
|
||||
<span id="cb4-75"><a href="#cb4-75" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>full_shard</span>
|
||||
<span id="cb4-76"><a href="#cb4-76" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span>auto_wrap</span>
|
||||
<span id="cb4-77"><a href="#cb4-77" aria-hidden="true" tabindex="-1"></a><span class="fu">fsdp_config</span><span class="kw">:</span></span>
|
||||
<span id="cb4-78"><a href="#cb4-78" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_offload_params</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
|
||||
<span id="cb4-79"><a href="#cb4-79" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_state_dict_type</span><span class="kw">:</span><span class="at"> FULL_STATE_DICT</span></span>
|
||||
<span id="cb4-80"><a href="#cb4-80" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">fsdp_transformer_layer_cls_to_wrap</span><span class="kw">:</span><span class="at"> LlamaDecoderLayer</span></span>
|
||||
<span id="cb4-81"><a href="#cb4-81" aria-hidden="true" tabindex="-1"></a><span class="in">```</span></span>
|
||||
<span id="cb4-82"><a href="#cb4-82" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-83"><a href="#cb4-83" aria-hidden="true" tabindex="-1"></a><span class="fu">## Sequence parallelism {#sec-sequence-parallelism}</span></span>
|
||||
<span id="cb4-84"><a href="#cb4-84" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-85"><a href="#cb4-85" aria-hidden="true" tabindex="-1"></a>We support sequence parallelism (SP) via the</span>
|
||||
<span id="cb4-86"><a href="#cb4-86" aria-hidden="true" tabindex="-1"></a><span class="co">[</span><span class="ot">ring-flash-attention</span><span class="co">](https://github.com/zhuzilin/ring-flash-attention)</span> project. This</span>
|
||||
<span id="cb4-87"><a href="#cb4-87" aria-hidden="true" tabindex="-1"></a>allows one to split up sequences across GPUs, which is useful in the event that a</span>
|
||||
<span id="cb4-88"><a href="#cb4-88" aria-hidden="true" tabindex="-1"></a>single sequence causes OOM errors during model training.</span>
|
||||
<span id="cb4-89"><a href="#cb4-89" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-90"><a href="#cb4-90" aria-hidden="true" tabindex="-1"></a>See our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](sequence_parallelism.qmd)</span> for more information.</span>
|
||||
<span id="cb4-91"><a href="#cb4-91" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-92"><a href="#cb4-92" aria-hidden="true" tabindex="-1"></a><span class="fu">### FSDP + QLoRA {#sec-fsdp-qlora}</span></span>
|
||||
<span id="cb4-93"><a href="#cb4-93" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-94"><a href="#cb4-94" aria-hidden="true" tabindex="-1"></a>For combining FSDP with QLoRA, see our <span class="co">[</span><span class="ot">dedicated guide</span><span class="co">](fsdp_qlora.qmd)</span>.</span>
|
||||
<span id="cb4-95"><a href="#cb4-95" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-96"><a href="#cb4-96" aria-hidden="true" tabindex="-1"></a><span class="fu">## Performance Optimization {#sec-performance}</span></span>
|
||||
<span id="cb4-97"><a href="#cb4-97" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-98"><a href="#cb4-98" aria-hidden="true" tabindex="-1"></a><span class="fu">### Liger Kernel Integration {#sec-liger}</span></span>
|
||||
<span id="cb4-99"><a href="#cb4-99" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-100"><a href="#cb4-100" aria-hidden="true" tabindex="-1"></a>Please see <span class="co">[</span><span class="ot">docs</span><span class="co">](custom_integrations.qmd#liger)</span> for more info.</span>
|
||||
<span id="cb4-101"><a href="#cb4-101" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-102"><a href="#cb4-102" aria-hidden="true" tabindex="-1"></a><span class="fu">## Troubleshooting {#sec-troubleshooting}</span></span>
|
||||
<span id="cb4-103"><a href="#cb4-103" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-104"><a href="#cb4-104" aria-hidden="true" tabindex="-1"></a><span class="fu">### NCCL Issues {#sec-nccl}</span></span>
|
||||
<span id="cb4-105"><a href="#cb4-105" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-106"><a href="#cb4-106" aria-hidden="true" tabindex="-1"></a>For NCCL-related problems, see our <span class="co">[</span><span class="ot">NCCL troubleshooting guide</span><span class="co">](nccl.qmd)</span>.</span>
|
||||
<span id="cb4-107"><a href="#cb4-107" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-108"><a href="#cb4-108" aria-hidden="true" tabindex="-1"></a><span class="fu">### Common Problems {#sec-common-problems}</span></span>
|
||||
<span id="cb4-109"><a href="#cb4-109" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-110"><a href="#cb4-110" aria-hidden="true" tabindex="-1"></a>::: {.panel-tabset}</span>
|
||||
<span id="cb4-111"><a href="#cb4-111" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-112"><a href="#cb4-112" aria-hidden="true" tabindex="-1"></a><span class="fu">## Memory Issues</span></span>
|
||||
<span id="cb4-113"><a href="#cb4-113" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-114"><a href="#cb4-114" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`micro_batch_size`</span></span>
|
||||
<span id="cb4-115"><a href="#cb4-115" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Reduce <span class="in">`eval_batch_size`</span></span>
|
||||
<span id="cb4-116"><a href="#cb4-116" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Adjust <span class="in">`gradient_accumulation_steps`</span></span>
|
||||
<span id="cb4-117"><a href="#cb4-117" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Consider using a higher ZeRO stage</span>
|
||||
<span id="cb4-118"><a href="#cb4-118" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-119"><a href="#cb4-119" aria-hidden="true" tabindex="-1"></a><span class="fu">## Training Instability</span></span>
|
||||
<span id="cb4-120"><a href="#cb4-120" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-121"><a href="#cb4-121" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Start with DeepSpeed ZeRO-2</span>
|
||||
<span id="cb4-122"><a href="#cb4-122" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Monitor loss values</span>
|
||||
<span id="cb4-123"><a href="#cb4-123" aria-hidden="true" tabindex="-1"></a><span class="ss">- </span>Check learning rates</span>
|
||||
<span id="cb4-124"><a href="#cb4-124" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-125"><a href="#cb4-125" aria-hidden="true" tabindex="-1"></a>:::</span>
|
||||
<span id="cb4-126"><a href="#cb4-126" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb4-127"><a href="#cb4-127" aria-hidden="true" tabindex="-1"></a>For more detailed troubleshooting, see our <span class="co">[</span><span class="ot">debugging guide</span><span class="co">](debugging.qmd)</span>.</span></code><button title="Copy to Clipboard" class="code-copy-button" data-in-quarto-modal=""><i class="bi"></i></button></pre></div>
|
||||
</div></div></div></div></div>
|
||||
</div> <!-- /content -->
|
||||
|
||||
|
||||
@@ -520,7 +520,7 @@ through a ring communication pattern.</p>
|
||||
<ol type="1">
|
||||
<li>Each sequence is divided into equal chunks across the GPUs in a sequence parallel group</li>
|
||||
<li>The data collator handles the chunking of input_ids, attention_mask, labels, and position_ids</li>
|
||||
<li>Position IDs are adjusted to maintain proper relative positions, especially for packed sequences</li>
|
||||
<li>Position IDs are adjusted to maintain proper relative positions</li>
|
||||
<li>The trainer uses special ring communication patterns for attention operations</li>
|
||||
</ol>
|
||||
</section>
|
||||
@@ -551,11 +551,13 @@ through a ring communication pattern.</p>
|
||||
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="co">...</span></span>
|
||||
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="co">sequence_parallel_degree: 4 # Split each sequence into 4 parts, one per GPU</span></span>
|
||||
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="co">flash_attention: true # Required with sequence parallelism</span></span>
|
||||
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but should make training faster.</span></span>
|
||||
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="co">heads_k_stride: 1</span></span>
|
||||
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="co">...</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; strides across the key dimension. Larger values use more memory but should make training faster.</span></span>
|
||||
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="co">heads_k_stride: 1</span></span>
|
||||
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Optional; one of "varlen_llama3" or "batch_ring". Defaults to</span></span>
|
||||
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="co"># "varlen_llama3" when `sample_packing: true`, and "batch_ring" otherwise.</span></span>
|
||||
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="co">ring_attn_func:</span></span>
|
||||
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span>
|
||||
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="co">...</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<p>This will train the Llama 3 8B model with 8K context length, with each sequence split
|
||||
into 2 subsequences of length 4096 across 2 GPUs.</p>
|
||||
</section>
|
||||
|
||||
2384
search.json
2384
search.json
File diff suppressed because one or more lines are too long
1166
sitemap.xml
1166
sitemap.xml
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user