Built site for gh-pages

2026-03-16 04:21:26 +00:00
parent a049510950
commit de3e742dbb
6 changed files with 315 additions and 265 deletions
--- a/search.json
+++ b/search.json
@@ -1247,11 +1247,11 @@
    ]
  },
  {
-    "objectID": "docs/attention.html#flash-attention-2",
-    "href": "docs/attention.html#flash-attention-2",
+    "objectID": "docs/attention.html#flash-attention",
+    "href": "docs/attention.html#flash-attention",
    "title": "Attention",
-    "section": "Flash Attention 2",
-    "text": "Flash Attention 2\nUses efficient kernels to compute attention.\nflash_attention: true\nFor more details: Flash Attention\n\nNvidia\nRequirements: Ampere, Ada, or Hopper GPUs\nNote: For Turing GPUs or lower, please use other attention methods.\npip install flash-attn --no-build-isolation\n\n\n\n\n\n\nTip\n\n\n\nIf you get undefined symbol while training, ensure you installed PyTorch prior to Axolotl. Alternatively, try reinstall or downgrade a version.\n\n\n\nFlash Attention 3\nRequirements: Hopper only and CUDA 12.8 (recommended)\ngit clone https://github.com/Dao-AILab/flash-attention.git\ncd flash-attention/hopper\n\npython setup.py install\n\n\n\nAMD\nRequirements: ROCm 6.0 and above.\nSee Flash Attention AMD docs.",
+    "section": "Flash Attention",
+    "text": "Flash Attention\nAxolotl supports Flash Attention 2, 3, and 4. The best available version is used automatically\nbased on your installed packages and GPU.\nflash_attention: true\nFor more details: Flash Attention\n\nFlash Attention 2\nRequirements: Ampere, Ada, or Hopper GPUs (Turing or lower not supported)\npip install flash-attn --no-build-isolation\n\n\n\n\n\n\nTip\n\n\n\nIf you get undefined symbol while training, ensure you installed PyTorch prior to Axolotl.\nAlternatively, try reinstall or downgrade a version.\n\n\n\n\nFlash Attention 3\nRequirements: Hopper only and CUDA 12.8 (recommended)\ngit clone https://github.com/Dao-AILab/flash-attention.git\ncd flash-attention/hopper\n\npython setup.py install\n\n\nFlash Attention 4\nRequirements: Hopper or Blackwell GPUs\npip install flash-attn-4\nOr from source:\ngit clone https://github.com/Dao-AILab/flash-attention.git\ncd flash-attention/flash_attn/cute\n\npip install -e .\n\n# FA2's flash_attn package includes a cute/ stub that shadows FA4.\n# Remove it so Python can find the real FA4 module:\nrm -r $(python -c \"import flash_attn; print(flash_attn.__path__[0])\")/cute\n\n\n\n\n\n\nNote\n\n\n\nHopper (SM90) users: The backward kernel is not yet included in the pip package. To use FA4\nfor training on Hopper, install from source using the instructions above.\n\n\n\n\n\n\n\n\nWarning\n\n\n\nFA4 only supports head dimensions up to 128 (d ≤ 128). The DeepSeek shape (192, 128) is\nalso supported but only on Blackwell. Axolotl automatically detects incompatible head dimensions\nand falls back to FA2/3.\n\n\nFor more details: flash-attention/flash_attn/cute\n\n\nAMD\nRequirements: ROCm 6.0 and above.\nSee Flash Attention AMD docs.",
    "crumbs": [
      "Core Concepts",
      "Attention"
@@ -3109,7 +3109,7 @@
    "href": "index.html#overview",
    "title": "Axolotl",
    "section": "✨ Overview",
-    "text": "✨ Overview\nAxolotl is a free and open-source tool designed to streamline post-training and fine-tuning for the latest large language models (LLMs).\nFeatures:\n\nMultiple Model Support: Train various models like GPT-OSS, LLaMA, Mistral, Mixtral, Pythia, and many more models available on the Hugging Face Hub.\nMultimodal Training: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support.\nTraining Methods: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) / Process Reward Modelling (PRM).\nEasy Configuration: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference.\nPerformance Optimizations: Multipacking, Flash Attention, Xformers, Flex Attention, SageAttention, Liger Kernel, Cut Cross Entropy, ScatterMoE, Sequence Parallelism (SP), LoRA optimizations, Multi-GPU training (FSDP1, FSDP2, DeepSpeed), Multi-node training (Torchrun, Ray), and many more!\nFlexible Dataset Handling: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.\nCloud Ready: We ship Docker images and also PyPI packages for use on cloud platforms and local hardware.",
+    "text": "✨ Overview\nAxolotl is a free and open-source tool designed to streamline post-training and fine-tuning for the latest large language models (LLMs).\nFeatures:\n\nMultiple Model Support: Train various models like GPT-OSS, LLaMA, Mistral, Mixtral, Pythia, and many more models available on the Hugging Face Hub.\nMultimodal Training: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support.\nTraining Methods: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) / Process Reward Modelling (PRM).\nEasy Configuration: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference.\nPerformance Optimizations: Multipacking, Flash Attention 2/3/4, Xformers, Flex Attention, SageAttention, Liger Kernel, Cut Cross Entropy, ScatterMoE, Sequence Parallelism (SP), LoRA optimizations, Multi-GPU training (FSDP1, FSDP2, DeepSpeed), Multi-node training (Torchrun, Ray), and many more!\nFlexible Dataset Handling: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.\nCloud Ready: We ship Docker images and also PyPI packages for use on cloud platforms and local hardware.",
    "crumbs": [
      "Home"
    ]