diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 580c4047c..a3a24537c 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -8,6 +8,9 @@ on: - "v*" workflow_dispatch: +permissions: + contents: read + jobs: build-axolotl: if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }} diff --git a/.nojekyll b/.nojekyll index cfedc3aba..09ce6e542 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -6e9883a7 \ No newline at end of file +756ab801 \ No newline at end of file diff --git a/docs/attention.html b/docs/attention.html index e3be939e8..0f42d2b5c 100644 --- a/docs/attention.html +++ b/docs/attention.html @@ -756,9 +756,11 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
For more details: PyTorch docs
-Uses efficient kernels to compute attention.
+Axolotl supports Flash Attention 2, 3, and 4. The best available version is used automatically +based on your installed packages and GPU.
For more details: Flash Attention
-Requirements: Ampere, Ada, or Hopper GPUs
-Note: For Turing GPUs or lower, please use other attention methods.
+Requirements: Ampere, Ada, or Hopper GPUs (Turing or lower not supported)
If you get undefined symbol while training, ensure you installed PyTorch prior to Axolotl. Alternatively, try reinstall or downgrade a version.
If you get undefined symbol while training, ensure you installed PyTorch prior to Axolotl.
+Alternatively, try reinstall or downgrade a version.
Requirements: Hopper only and CUDA 12.8 (recommended)
Requirements: Hopper or Blackwell GPUs
+ +Or from source:
+git clone https://github.com/Dao-AILab/flash-attention.git
+cd flash-attention/flash_attn/cute
+
+pip install -e .
+
+# FA2's flash_attn package includes a cute/ stub that shadows FA4.
+# Remove it so Python can find the real FA4 module:
+rm -r $(python -c "import flash_attn; print(flash_attn.__path__[0])")/cuteHopper (SM90) users: The backward kernel is not yet included in the pip package. To use FA4 +for training on Hopper, install from source using the instructions above.
+FA4 only supports head dimensions up to 128 (d ≤ 128). The DeepSeek shape (192, 128) is
+also supported but only on Blackwell. Axolotl automatically detects incompatible head dimensions
+and falls back to FA2/3.
For more details: flash-attention/flash_attn/cute
A flexible PyTorch API for attention used in combination with torch.compile.
Attention kernels with QK Int8 and PV FP16 accumulator.
- +Requirements: Ampere, Ada, or Hopper GPUs
- +Requirements: LLaMA model architecture
- +