Built site for gh-pages

2026-03-16 04:21:26 +00:00
parent a049510950
commit de3e742dbb
6 changed files with 315 additions and 265 deletions
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -8,6 +8,9 @@ on:
      - "v*"
  workflow_dispatch:

+permissions:
+  contents: read
+
 jobs:
  build-axolotl:
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-6e9883a7
+756ab801
--- a/docs/attention.html
+++ b/docs/attention.html
@@ -756,9 +756,11 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
   
  <ul>
  <li><a href="#sdp-attention" id="toc-sdp-attention" class="nav-link active" data-scroll-target="#sdp-attention">SDP Attention</a></li>
-  <li><a href="#flash-attention-2" id="toc-flash-attention-2" class="nav-link" data-scroll-target="#flash-attention-2">Flash Attention 2</a>
+  <li><a href="#flash-attention" id="toc-flash-attention" class="nav-link" data-scroll-target="#flash-attention">Flash Attention</a>
  <ul class="collapse">
-  <li><a href="#nvidia" id="toc-nvidia" class="nav-link" data-scroll-target="#nvidia">Nvidia</a></li>
+  <li><a href="#flash-attention-2" id="toc-flash-attention-2" class="nav-link" data-scroll-target="#flash-attention-2">Flash Attention 2</a></li>
+  <li><a href="#flash-attention-3" id="toc-flash-attention-3" class="nav-link" data-scroll-target="#flash-attention-3">Flash Attention 3</a></li>
+  <li><a href="#flash-attention-4" id="toc-flash-attention-4" class="nav-link" data-scroll-target="#flash-attention-4">Flash Attention 4</a></li>
  <li><a href="#amd" id="toc-amd" class="nav-link" data-scroll-target="#amd">AMD</a></li>
  </ul></li>
  <li><a href="#flex-attention" id="toc-flex-attention" class="nav-link" data-scroll-target="#flex-attention">Flex Attention</a></li>
@@ -801,15 +803,15 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
 <div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sdp_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>For more details: <a href="https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html">PyTorch docs</a></p>
 </section>
-<section id="flash-attention-2" class="level2">
-<h2 class="anchored" data-anchor-id="flash-attention-2">Flash Attention 2</h2>
-<p>Uses efficient kernels to compute attention.</p>
+<section id="flash-attention" class="level2">
+<h2 class="anchored" data-anchor-id="flash-attention">Flash Attention</h2>
+<p>Axolotl supports Flash Attention 2, 3, and 4. The best available version is used automatically
+based on your installed packages and GPU.</p>
 <div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>For more details: <a href="https://github.com/Dao-AILab/flash-attention/">Flash Attention</a></p>
-<section id="nvidia" class="level3">
-<h3 class="anchored" data-anchor-id="nvidia">Nvidia</h3>
-<p>Requirements: Ampere, Ada, or Hopper GPUs</p>
-<p>Note: For Turing GPUs or lower, please use other attention methods.</p>
+<section id="flash-attention-2" class="level3">
+<h3 class="anchored" data-anchor-id="flash-attention-2">Flash Attention 2</h3>
+<p>Requirements: Ampere, Ada, or Hopper GPUs (Turing or lower not supported)</p>
 <div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip</span> install flash-attn <span class="at">--no-build-isolation</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="callout callout-style-default callout-tip callout-titled">
 <div class="callout-header d-flex align-content-center">
@@ -821,17 +823,62 @@ Tip
 </div>
 </div>
 <div class="callout-body-container callout-body">
-<p>If you get <code>undefined symbol</code> while training, ensure you installed PyTorch prior to Axolotl. Alternatively, try reinstall or downgrade a version.</p>
+<p>If you get <code>undefined symbol</code> while training, ensure you installed PyTorch prior to Axolotl.
+Alternatively, try reinstall or downgrade a version.</p>
 </div>
 </div>
-<section id="flash-attention-3" class="level4">
-<h4 class="anchored" data-anchor-id="flash-attention-3">Flash Attention 3</h4>
+</section>
+<section id="flash-attention-3" class="level3">
+<h3 class="anchored" data-anchor-id="flash-attention-3">Flash Attention 3</h3>
 <p>Requirements: Hopper only and CUDA 12.8 (recommended)</p>
 <div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://github.com/Dao-AILab/flash-attention.git</span>
 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> flash-attention/hopper</span>
 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="ex">python</span> setup.py install</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 </section>
+<section id="flash-attention-4" class="level3">
+<h3 class="anchored" data-anchor-id="flash-attention-4">Flash Attention 4</h3>
+<p>Requirements: Hopper or Blackwell GPUs</p>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip</span> install flash-attn-4</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<p>Or from source:</p>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://github.com/Dao-AILab/flash-attention.git</span>
+<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> flash-attention/flash_attn/cute</span>
+<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="ex">pip</span> install <span class="at">-e</span> .</span>
+<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a><span class="co"># FA2's flash_attn package includes a cute/ stub that shadows FA4.</span></span>
+<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Remove it so Python can find the real FA4 module:</span></span>
+<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a><span class="fu">rm</span> <span class="at">-r</span> <span class="va">$(</span><span class="ex">python</span> <span class="at">-c</span> <span class="st">"import flash_attn; print(flash_attn.__path__[0])"</span><span class="va">)</span>/cute</span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="callout callout-style-default callout-note callout-titled">
+<div class="callout-header d-flex align-content-center">
+<div class="callout-icon-container">
+<i class="callout-icon"></i>
+</div>
+<div class="callout-title-container flex-fill">
+Note
+</div>
+</div>
+<div class="callout-body-container callout-body">
+<p><strong>Hopper (SM90) users</strong>: The backward kernel is not yet included in the pip package. To use FA4
+for training on Hopper, install from source using the instructions above.</p>
+</div>
+</div>
+<div class="callout callout-style-default callout-warning callout-titled">
+<div class="callout-header d-flex align-content-center">
+<div class="callout-icon-container">
+<i class="callout-icon"></i>
+</div>
+<div class="callout-title-container flex-fill">
+Warning
+</div>
+</div>
+<div class="callout-body-container callout-body">
+<p>FA4 only supports head dimensions up to 128 (<code>d ≤ 128</code>). The DeepSeek shape <code>(192, 128)</code> is
+also supported but only on Blackwell. Axolotl automatically detects incompatible head dimensions
+and falls back to FA2/3.</p>
+</div>
+</div>
+<p>For more details: <a href="https://github.com/Dao-AILab/flash-attention/tree/main/flash_attn/cute">flash-attention/flash_attn/cute</a></p>
 </section>
 <section id="amd" class="level3">
 <h3 class="anchored" data-anchor-id="amd">AMD</h3>
@@ -842,10 +889,10 @@ Tip
 <section id="flex-attention" class="level2">
 <h2 class="anchored" data-anchor-id="flex-attention">Flex Attention</h2>
 <p>A flexible PyTorch API for attention used in combination with <code>torch.compile</code>.</p>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">flex_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
-<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
-<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"># recommended</span></span>
-<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="fu">torch_compile</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">flex_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
+<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
+<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="co"># recommended</span></span>
+<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="fu">torch_compile</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="callout callout-style-default callout-note callout-titled">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">
@@ -864,9 +911,9 @@ Note
 <section id="sageattention" class="level2">
 <h2 class="anchored" data-anchor-id="sageattention">SageAttention</h2>
 <p>Attention kernels with QK Int8 and PV FP16 accumulator.</p>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sage_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sage_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <p>Requirements: Ampere, Ada, or Hopper GPUs</p>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip</span> install sageattention==2.2.0 <span class="at">--no-build-isolation</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ex">pip</span> install sageattention==2.2.0 <span class="at">--no-build-isolation</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="callout callout-style-default callout-warning callout-titled">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">
@@ -897,7 +944,7 @@ Note
 </section>
 <section id="xformers" class="level2">
 <h2 class="anchored" data-anchor-id="xformers">xFormers</h2>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="fu">xformers_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">xformers_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="callout callout-style-default callout-tip callout-titled">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">
@@ -929,8 +976,8 @@ Warning
 </div>
 </div>
 <p>Requirements: LLaMA model architecture</p>
-<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
-<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="fu">s2_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
+<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">flash_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span>
+<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="fu">s2_attention</span><span class="kw">:</span><span class="at"> </span><span class="ch">true</span></span></code></pre></div><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></div>
 <div class="callout callout-style-default callout-tip callout-titled">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">
--- a/index.html
+++ b/index.html
@@ -863,7 +863,7 @@ Expand older updates
 <li><strong>Multimodal Training</strong>: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support.</li>
 <li><strong>Training Methods</strong>: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) / Process Reward Modelling (PRM).</li>
 <li><strong>Easy Configuration</strong>: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference.</li>
-<li><strong>Performance Optimizations</strong>: <a href="https://docs.axolotl.ai/docs/multipack.html">Multipacking</a>, <a href="https://github.com/Dao-AILab/flash-attention">Flash Attention</a>, <a href="https://github.com/facebookresearch/xformers">Xformers</a>, <a href="https://pytorch.org/blog/flexattention/">Flex Attention</a>, <a href="https://github.com/thu-ml/SageAttention">SageAttention</a>, <a href="https://github.com/linkedin/Liger-Kernel">Liger Kernel</a>, <a href="https://github.com/apple/ml-cross-entropy/tree/main">Cut Cross Entropy</a>, <a href="https://docs.axolotl.ai/docs/custom_integrations.html#kernels-integration">ScatterMoE</a>, <a href="https://docs.axolotl.ai/docs/sequence_parallelism.html">Sequence Parallelism (SP)</a>, <a href="https://docs.axolotl.ai/docs/lora_optims.html">LoRA optimizations</a>, <a href="https://docs.axolotl.ai/docs/multi-gpu.html">Multi-GPU training (FSDP1, FSDP2, DeepSpeed)</a>, <a href="https://docs.axolotl.ai/docs/multi-node.html">Multi-node training (Torchrun, Ray)</a>, and many more!</li>
+<li><strong>Performance Optimizations</strong>: <a href="https://docs.axolotl.ai/docs/multipack.html">Multipacking</a>, <a href="https://docs.axolotl.ai/docs/attention.html#flash-attention">Flash Attention 2/3/4</a>, <a href="https://docs.axolotl.ai/docs/attention.html#xformers">Xformers</a>, <a href="https://docs.axolotl.ai/docs/attention.html#flex-attention">Flex Attention</a>, <a href="https://docs.axolotl.ai/docs/attention.html#sageattention">SageAttention</a>, <a href="https://docs.axolotl.ai/docs/custom_integrations.html#liger-kernels">Liger Kernel</a>, <a href="https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy">Cut Cross Entropy</a>, <a href="https://docs.axolotl.ai/docs/custom_integrations.html#kernels-integration">ScatterMoE</a>, <a href="https://docs.axolotl.ai/docs/sequence_parallelism.html">Sequence Parallelism (SP)</a>, <a href="https://docs.axolotl.ai/docs/lora_optims.html">LoRA optimizations</a>, <a href="https://docs.axolotl.ai/docs/multi-gpu.html">Multi-GPU training (FSDP1, FSDP2, DeepSpeed)</a>, <a href="https://docs.axolotl.ai/docs/multi-node.html">Multi-node training (Torchrun, Ray)</a>, and many more!</li>
 <li><strong>Flexible Dataset Handling</strong>: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.</li>
 <li><strong>Cloud Ready</strong>: We ship <a href="https://hub.docker.com/u/axolotlai">Docker images</a> and also <a href="https://pypi.org/project/axolotl/">PyPI packages</a> for use on cloud platforms and local hardware.</li>
 </ul>
--- a/search.json
+++ b/search.json
@@ -1247,11 +1247,11 @@
    ]
  },
  {
-    "objectID": "docs/attention.html#flash-attention-2",
-    "href": "docs/attention.html#flash-attention-2",
+    "objectID": "docs/attention.html#flash-attention",
+    "href": "docs/attention.html#flash-attention",
    "title": "Attention",
-    "section": "Flash Attention 2",
-    "text": "Flash Attention 2\nUses efficient kernels to compute attention.\nflash_attention: true\nFor more details: Flash Attention\n\nNvidia\nRequirements: Ampere, Ada, or Hopper GPUs\nNote: For Turing GPUs or lower, please use other attention methods.\npip install flash-attn --no-build-isolation\n\n\n\n\n\n\nTip\n\n\n\nIf you get undefined symbol while training, ensure you installed PyTorch prior to Axolotl. Alternatively, try reinstall or downgrade a version.\n\n\n\nFlash Attention 3\nRequirements: Hopper only and CUDA 12.8 (recommended)\ngit clone https://github.com/Dao-AILab/flash-attention.git\ncd flash-attention/hopper\n\npython setup.py install\n\n\n\nAMD\nRequirements: ROCm 6.0 and above.\nSee Flash Attention AMD docs.",
+    "section": "Flash Attention",
+    "text": "Flash Attention\nAxolotl supports Flash Attention 2, 3, and 4. The best available version is used automatically\nbased on your installed packages and GPU.\nflash_attention: true\nFor more details: Flash Attention\n\nFlash Attention 2\nRequirements: Ampere, Ada, or Hopper GPUs (Turing or lower not supported)\npip install flash-attn --no-build-isolation\n\n\n\n\n\n\nTip\n\n\n\nIf you get undefined symbol while training, ensure you installed PyTorch prior to Axolotl.\nAlternatively, try reinstall or downgrade a version.\n\n\n\n\nFlash Attention 3\nRequirements: Hopper only and CUDA 12.8 (recommended)\ngit clone https://github.com/Dao-AILab/flash-attention.git\ncd flash-attention/hopper\n\npython setup.py install\n\n\nFlash Attention 4\nRequirements: Hopper or Blackwell GPUs\npip install flash-attn-4\nOr from source:\ngit clone https://github.com/Dao-AILab/flash-attention.git\ncd flash-attention/flash_attn/cute\n\npip install -e .\n\n# FA2's flash_attn package includes a cute/ stub that shadows FA4.\n# Remove it so Python can find the real FA4 module:\nrm -r $(python -c \"import flash_attn; print(flash_attn.__path__[0])\")/cute\n\n\n\n\n\n\nNote\n\n\n\nHopper (SM90) users: The backward kernel is not yet included in the pip package. To use FA4\nfor training on Hopper, install from source using the instructions above.\n\n\n\n\n\n\n\n\nWarning\n\n\n\nFA4 only supports head dimensions up to 128 (d ≤ 128). The DeepSeek shape (192, 128) is\nalso supported but only on Blackwell. Axolotl automatically detects incompatible head dimensions\nand falls back to FA2/3.\n\n\nFor more details: flash-attention/flash_attn/cute\n\n\nAMD\nRequirements: ROCm 6.0 and above.\nSee Flash Attention AMD docs.",
    "crumbs": [
      "Core Concepts",
      "Attention"
@@ -3109,7 +3109,7 @@
    "href": "index.html#overview",
    "title": "Axolotl",
    "section": "✨ Overview",
-    "text": "✨ Overview\nAxolotl is a free and open-source tool designed to streamline post-training and fine-tuning for the latest large language models (LLMs).\nFeatures:\n\nMultiple Model Support: Train various models like GPT-OSS, LLaMA, Mistral, Mixtral, Pythia, and many more models available on the Hugging Face Hub.\nMultimodal Training: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support.\nTraining Methods: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) / Process Reward Modelling (PRM).\nEasy Configuration: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference.\nPerformance Optimizations: Multipacking, Flash Attention, Xformers, Flex Attention, SageAttention, Liger Kernel, Cut Cross Entropy, ScatterMoE, Sequence Parallelism (SP), LoRA optimizations, Multi-GPU training (FSDP1, FSDP2, DeepSpeed), Multi-node training (Torchrun, Ray), and many more!\nFlexible Dataset Handling: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.\nCloud Ready: We ship Docker images and also PyPI packages for use on cloud platforms and local hardware.",
+    "text": "✨ Overview\nAxolotl is a free and open-source tool designed to streamline post-training and fine-tuning for the latest large language models (LLMs).\nFeatures:\n\nMultiple Model Support: Train various models like GPT-OSS, LLaMA, Mistral, Mixtral, Pythia, and many more models available on the Hugging Face Hub.\nMultimodal Training: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support.\nTraining Methods: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) / Process Reward Modelling (PRM).\nEasy Configuration: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference.\nPerformance Optimizations: Multipacking, Flash Attention 2/3/4, Xformers, Flex Attention, SageAttention, Liger Kernel, Cut Cross Entropy, ScatterMoE, Sequence Parallelism (SP), LoRA optimizations, Multi-GPU training (FSDP1, FSDP2, DeepSpeed), Multi-node training (Torchrun, Ray), and many more!\nFlexible Dataset Handling: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.\nCloud Ready: We ship Docker images and also PyPI packages for use on cloud platforms and local hardware.",
    "crumbs": [
      "Home"
    ]
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -1 +1 @@
 e9883a7
 ab801