Built site for gh-pages

2025-03-21 17:30:33 +00:00
parent 486fc53c93
commit 127f9229b5
171 changed files with 127099 additions and 1001 deletions
--- a/docs/multipack.html
+++ b/docs/multipack.html
@@ -144,7 +144,7 @@ ul.task-list li input[type="checkbox"] {
          <li class="sidebar-item">
  <div class="sidebar-item-container"> 
  <a href="../docs/cli.html" class="sidebar-item-text sidebar-link">
- <span class="menu-text">CLI Reference</span></a>
+ <span class="menu-text">Command Line Interface (CLI)</span></a>
  </div>
 </li>
          <li class="sidebar-item">
@@ -152,6 +152,12 @@ ul.task-list li input[type="checkbox"] {
  <a href="../docs/config.html" class="sidebar-item-text sidebar-link">
 <span class="menu-text">Config Reference</span></a>
  </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="../docs/api" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">API Reference</span></a>
+  </div>
 </li>
      </ul>
  </li>
@@ -427,8 +433,12 @@ ul.task-list li input[type="checkbox"] {

 <section id="visualization-of-multipack-with-flash-attention" class="level2">
 <h2 class="anchored" data-anchor-id="visualization-of-multipack-with-flash-attention">Visualization of Multipack with Flash Attention</h2>
-<p>Because Flash Attention simply drops the attention mask, we do not need to construct a 4d attention mask. We only need to concatenate the sequences into a single batch and let flash attention know where each new sequence begins.</p>
-<p>4k context, bsz =4, each character represents 256 tokens X represents a padding token</p>
+<p>Because Flash Attention simply drops the attention mask, we do not need to
+construct a 4d attention mask. We only need to concatenate the sequences into
+a single batch and let flash attention know where each new sequence begins.</p>
+<p>4k context, bsz =4,
+each character represents 256 tokens
+X represents a padding token</p>
 <pre><code>   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
 [[ A A A A A A A A A A A ]
   B B B B B B ]
@@ -466,11 +476,17 @@ ul.task-list li input[type="checkbox"] {
   B C C C C C C C D D D D E E E E
   E E E E F F F F F G G G H H H H
   I I I J J J J K K K K K L L L X ]]</code></pre>
-<p>cu_seqlens: [[ 0, 11, 17, 24, 28, 36, 41 44, 48, 51, 55, 60, 64]]</p>
+<p>cu_seqlens:
+[[ 0, 11, 17, 24, 28, 36, 41 44, 48, 51, 55, 60, 64]]</p>
 </section>
 <section id="multipack-without-flash-attention" class="level2">
 <h2 class="anchored" data-anchor-id="multipack-without-flash-attention">Multipack without Flash Attention</h2>
-<p>Multipack can still be achieved without Flash attention, but with lower packing efficiency as we are not able to join multiple batches into a single batch due to context length limits without flash attention. We can use either Pytorch’s Scaled Dot Product Attention implementation or native Pytorch attention implementation along with <a href="https://github.com/huggingface/transformers/pull/27539">4d attention masks</a> to pack sequences together and avoid cross attention.</p>
+<p>Multipack can still be achieved without Flash attention, but with lower packing
+efficiency as we are not able to join multiple batches into a single batch due to
+context length limits without flash attention. We can use either Pytorch’s Scaled
+Dot Product Attention implementation or native Pytorch attention implementation
+along with <a href="https://github.com/huggingface/transformers/pull/27539">4d attention masks</a>
+to pack sequences together and avoid cross attention.</p>
 <p><img src="./images/4d-mask.png" alt="axolotl" width="800"></p>