diff --git a/.nojekyll b/.nojekyll index 1418c6c6c..c75f5ffe3 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -899be981 \ No newline at end of file +e37ff14d \ No newline at end of file diff --git a/docs/custom_integrations.html b/docs/custom_integrations.html index b2583bd48..3ee0da140 100644 --- a/docs/custom_integrations.html +++ b/docs/custom_integrations.html @@ -619,7 +619,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); -
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"
+
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"

Usage

@@ -663,6 +663,9 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
  • mistral3
  • mixtral
  • mllama
  • +
  • olmo
  • +
  • olmo2
  • +
  • olmo3
  • phi
  • phi3
  • phi4_multimodal
  • diff --git a/docs/multi-gpu.html b/docs/multi-gpu.html index e91a27b87..439961687 100644 --- a/docs/multi-gpu.html +++ b/docs/multi-gpu.html @@ -511,30 +511,28 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

    On this page

    @@ -562,25 +560,30 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

    This guide covers advanced training configurations for multi-GPU setups using Axolotl.

    -
    -

    1 Overview

    -

    Axolotl supports several methods for multi-GPU training:

    +
    +

    Overview

    +

    When training on multiple GPUs, Axolotl supports 3 sharding/parallelism strategies. Additionally, you can layer specific optimization features on top of that strategy.

    +

    You generally cannot combine these strategies; they are mutually exclusive.

    +
      +
    1. DeepSpeed: Powerful optimization library, supports ZeRO stages 1-3.
    2. +
    3. FSDP (Fully Sharded Data Parallel): PyTorch’s native sharding implementation (Recommended).
    4. +
    5. DDP (Distributed Data Parallel): PyTorch’s native parallelism implementation (Default if neither of the above are selected).
    6. +
    +

    These features can often be combined with the strategies above:

      -
    • DeepSpeed (recommended)
    • -
    • FSDP (Fully Sharded Data Parallel)
    • -
    • Sequence parallelism
    • -
    • FSDP + QLoRA
    • +
    • Sequence Parallelism: Splits long sequences across GPUs (Compatible with DDP, DeepSpeed, and FSDP).
    • +
    • FSDP + QLoRA: Combines 4-bit quantization with FSDP (Specific to FSDP).
    -
    -

    2 DeepSpeed

    -
    -

    2.1 Configuration

    +
    +

    DeepSpeed

    +
    +

    Configuration

    Add to your YAML config:

    deepspeed: deepspeed_configs/zero1.json
    -
    -

    2.2 Usage

    +
    +

    Usage

    # Fetch deepspeed configs (if not already present)
     axolotl fetch deepspeed_configs
     
    @@ -590,8 +593,8 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
     # Passing arg via cli
     axolotl train config.yml --deepspeed deepspeed_configs/zero1.json
    -
    -

    2.3 ZeRO Stages

    +
    +

    ZeRO Stages

    We provide default configurations for:

    • ZeRO Stage 1 (zero1.json)
    • @@ -618,8 +621,9 @@ Tip
    -
    -

    3 Fully Sharded Data Parallel (FSDP)

    +
    +

    Fully Sharded Data Parallel (FSDP)

    +

    FSDP allows you to shard model parameters, gradients, and optimizer states across data parallel workers.

    @@ -633,12 +637,16 @@ Note

    FSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.

    -
    -

    3.1 Migrating from FSDP1 to FSDP2

    +
    +

    FSDP + QLoRA

    +

    For combining FSDP with QLoRA, see our dedicated guide.

    +
    +
    +

    Migrating from FSDP1 to FSDP2

    To migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and also follow the config field mapping below to update field names.

    -
    -

    3.1.1 Config mapping

    +
    +

    Config mapping

    @@ -706,8 +714,8 @@ if you were using the following FSDP1 config:

    reshard_after_forward: true -
    -

    3.2 FSDP1 (deprecated)

    +
    +

    FSDP1 (deprecated)

    @@ -730,33 +738,29 @@ Note fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
    -
    -

    4 Sequence parallelism

    +
    +

    Sequence parallelism

    We support sequence parallelism (SP) via the ring-flash-attention project. This allows one to split up sequences across GPUs, which is useful in the event that a single sequence causes OOM errors during model training.

    See our dedicated guide for more information.

    -
    -

    4.1 FSDP + QLoRA

    -

    For combining FSDP with QLoRA, see our dedicated guide.

    -
    -
    -

    5 Performance Optimization

    -
    -

    5.1 Liger Kernel Integration

    +
    +

    Performance Optimization

    +
    +

    Liger Kernel Integration

    Please see docs for more info.

    -
    -

    6 Troubleshooting

    -
    -

    6.1 NCCL Issues

    +
    +

    Troubleshooting

    +
    +

    NCCL Issues

    For NCCL-related problems, see our NCCL troubleshooting guide.

    -
    -

    6.2 Common Problems

    +
    +

    Common Problems

    @@ -1243,7 +1247,7 @@ single sequence causes OOM errors during model training.

    html: toc: true toc-depth: 3 - number-sections: true + # number-sections: true code-tools: true execute: enabled: false @@ -1253,173 +1257,181 @@ single sequence causes OOM errors during model training.

    ## Overview {#sec-overview} -Axolotl supports several methods for multi-GPU training: +When training on multiple GPUs, Axolotl supports 3 sharding/parallelism strategies. Additionally, you can layer specific optimization features on top of that strategy. -- DeepSpeed (recommended) -- FSDP (Fully Sharded Data Parallel) -- Sequence parallelism -- FSDP + QLoRA - -## DeepSpeed {#sec-deepspeed} - -### Configuration {#sec-deepspeed-config} - -Add to your YAML config: +You generally cannot combine these strategies; they are mutually exclusive. + +1. **DeepSpeed**: Powerful optimization library, supports ZeRO stages 1-3. +2. **FSDP (Fully Sharded Data Parallel)**: PyTorch's native sharding implementation (Recommended). +3. **DDP (Distributed Data Parallel)**: PyTorch's native parallelism implementation (Default if neither of the above are selected). + +These features can often be combined with the strategies above: + +* **Sequence Parallelism**: Splits long sequences across GPUs (Compatible with DDP, DeepSpeed, and FSDP). +* **FSDP + QLoRA**: Combines 4-bit quantization with FSDP (Specific to FSDP). -```{.yaml} -deepspeed: deepspeed_configs/zero1.json -``` -### Usage {#sec-deepspeed-usage} - -```{.bash} -# Fetch deepspeed configs (if not already present) -axolotl fetch deepspeed_configs - -# Passing arg via config -axolotl train config.yml - -# Passing arg via cli -axolotl train config.yml --deepspeed deepspeed_configs/zero1.json -``` - -### ZeRO Stages {#sec-zero-stages} +## DeepSpeed {#sec-deepspeed} + +### Configuration {#sec-deepspeed-config} + +Add to your YAML config: + +```{.yaml} +deepspeed: deepspeed_configs/zero1.json +``` +### Usage {#sec-deepspeed-usage} + +```{.bash} +# Fetch deepspeed configs (if not already present) +axolotl fetch deepspeed_configs + +# Passing arg via config +axolotl train config.yml -We provide default configurations for: - -- ZeRO Stage 1 (`zero1.json`) -- ZeRO Stage 1 with torch compile (`zero1_torch_compile.json`) -- ZeRO Stage 2 (`zero2.json`) -- ZeRO Stage 3 (`zero3.json`) -- ZeRO Stage 3 with bf16 (`zero3_bf16.json`) -- ZeRO Stage 3 with bf16 and CPU offload params(`zero3_bf16_cpuoffload_params.json`) -- ZeRO Stage 3 with bf16 and CPU offload params and optimizer (`zero3_bf16_cpuoffload_all.json`) - -::: {.callout-tip} - -Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance. - -Start from Stage 1 -> Stage 2 -> Stage 3. +# Passing arg via cli +axolotl train config.yml --deepspeed deepspeed_configs/zero1.json +``` + +### ZeRO Stages {#sec-zero-stages} + +We provide default configurations for: + +- ZeRO Stage 1 (`zero1.json`) +- ZeRO Stage 1 with torch compile (`zero1_torch_compile.json`) +- ZeRO Stage 2 (`zero2.json`) +- ZeRO Stage 3 (`zero3.json`) +- ZeRO Stage 3 with bf16 (`zero3_bf16.json`) +- ZeRO Stage 3 with bf16 and CPU offload params(`zero3_bf16_cpuoffload_params.json`) +- ZeRO Stage 3 with bf16 and CPU offload params and optimizer (`zero3_bf16_cpuoffload_all.json`) -::: +::: {.callout-tip} -## Fully Sharded Data Parallel (FSDP) {#sec-fsdp} +Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance. -::: {.callout-note} +Start from Stage 1 -> Stage 2 -> Stage 3. -FSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl. +::: -::: +## Fully Sharded Data Parallel (FSDP) {#sec-fsdp} -### Migrating from FSDP1 to FSDP2 {#sec-migrate-fsdp1-fsdp2} +FSDP allows you to shard model parameters, gradients, and optimizer states across data parallel workers. -To migrate your config from FSDP1 to FSDP2, you must use the `fsdp_version` top-level config field to specify the FSDP version, and -also follow the config field mapping below to update field names. - -#### Config mapping - -FSDP1 | FSDP2 --------- | -------- -fsdp_sharding_strategy | reshard_after_forward -fsdp_backward_prefetch_policy | **REMOVED** -fsdp_backward_prefetch | **REMOVED** -fsdp_forward_prefetch | **REMOVED** -fsdp_sync_module_states | **REMOVED** -fsdp_cpu_ram_efficient_loading | cpu_ram_efficient_loading -fsdp_state_dict_type | state_dict_type -fsdp_use_orig_params | **REMOVED** -fsdp_activation_checkpointing | activation_checkpointing +::: {.callout-note} + +FSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl. + +::: + +### FSDP + QLoRA {#sec-fsdp-qlora} + +For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). + +### Migrating from FSDP1 to FSDP2 {#sec-migrate-fsdp1-fsdp2} + +To migrate your config from FSDP1 to FSDP2, you must use the `fsdp_version` top-level config field to specify the FSDP version, and +also follow the config field mapping below to update field names. + +#### Config mapping -For more details, please see the migration guide in the [torchtitan repo](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md). In Axolotl, -if you were using the following FSDP1 config: - -```{.yaml} -fsdp_version: 1 -fsdp_config: - fsdp_offload_params: false - fsdp_cpu_ram_efficient_loading: true - fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP - fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer - fsdp_state_dict_type: FULL_STATE_DICT - fsdp_sharding_strategy: FULL_SHARD -``` - -You can migrate to the following FSDP2 config: - -```{.yaml} -fsdp_version: 2 -fsdp_config: - offload_params: false - cpu_ram_efficient_loading: true - auto_wrap_policy: TRANSFORMER_BASED_WRAP - transformer_layer_cls_to_wrap: Qwen3DecoderLayer - state_dict_type: FULL_STATE_DICT - reshard_after_forward: true -``` - -### FSDP1 (deprecated) {#sec-fsdp-config} - -::: {.callout-note} - -Using `fsdp` to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use `fsdp_config` as above instead. - -::: - -```{.yaml} -fsdp: - - full_shard - - auto_wrap -fsdp_config: - fsdp_offload_params: true - fsdp_state_dict_type: FULL_STATE_DICT - fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer -``` +FSDP1 | FSDP2 +-------- | -------- +fsdp_sharding_strategy | reshard_after_forward +fsdp_backward_prefetch_policy | **REMOVED** +fsdp_backward_prefetch | **REMOVED** +fsdp_forward_prefetch | **REMOVED** +fsdp_sync_module_states | **REMOVED** +fsdp_cpu_ram_efficient_loading | cpu_ram_efficient_loading +fsdp_state_dict_type | state_dict_type +fsdp_use_orig_params | **REMOVED** +fsdp_activation_checkpointing | activation_checkpointing + +For more details, please see the migration guide in the [torchtitan repo](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md). In Axolotl, +if you were using the following FSDP1 config: + +```{.yaml} +fsdp_version: 1 +fsdp_config: + fsdp_offload_params: false + fsdp_cpu_ram_efficient_loading: true + fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP + fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer + fsdp_state_dict_type: FULL_STATE_DICT + fsdp_sharding_strategy: FULL_SHARD +``` + +You can migrate to the following FSDP2 config: + +```{.yaml} +fsdp_version: 2 +fsdp_config: + offload_params: false + cpu_ram_efficient_loading: true + auto_wrap_policy: TRANSFORMER_BASED_WRAP + transformer_layer_cls_to_wrap: Qwen3DecoderLayer + state_dict_type: FULL_STATE_DICT + reshard_after_forward: true +``` + +### FSDP1 (deprecated) {#sec-fsdp-config} + +::: {.callout-note} + +Using `fsdp` to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use `fsdp_config` as above instead. - -## Sequence parallelism {#sec-sequence-parallelism} - -We support sequence parallelism (SP) via the -[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This -allows one to split up sequences across GPUs, which is useful in the event that a -single sequence causes OOM errors during model training. - -See our [dedicated guide](sequence_parallelism.qmd) for more information. - -### FSDP + QLoRA {#sec-fsdp-qlora} +::: + +```{.yaml} +fsdp: + - full_shard + - auto_wrap +fsdp_config: + fsdp_offload_params: true + fsdp_state_dict_type: FULL_STATE_DICT + fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer +``` -For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). - -## Performance Optimization {#sec-performance} - -### Liger Kernel Integration {#sec-liger} - -Please see [docs](custom_integrations.qmd#liger) for more info. + +## Sequence parallelism {#sec-sequence-parallelism} + +We support sequence parallelism (SP) via the +[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This +allows one to split up sequences across GPUs, which is useful in the event that a +single sequence causes OOM errors during model training. -## Troubleshooting {#sec-troubleshooting} +See our [dedicated guide](sequence_parallelism.qmd) for more information. -### NCCL Issues {#sec-nccl} +## Performance Optimization {#sec-performance} -For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd). +### Liger Kernel Integration {#sec-liger} -### Common Problems {#sec-common-problems} +Please see [docs](custom_integrations.qmd#liger) for more info. -::: {.panel-tabset} +## Troubleshooting {#sec-troubleshooting} -## Memory Issues +### NCCL Issues {#sec-nccl} -- Reduce `micro_batch_size` -- Reduce `eval_batch_size` -- Adjust `gradient_accumulation_steps` -- Consider using a higher ZeRO stage - -## Training Instability - -- Start with DeepSpeed ZeRO-2 -- Monitor loss values -- Check learning rates - -::: +For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd). + +### Common Problems {#sec-common-problems} + +::: {.panel-tabset} + +## Memory Issues + +- Reduce `micro_batch_size` +- Reduce `eval_batch_size` +- Adjust `gradient_accumulation_steps` +- Consider using a higher ZeRO stage -For more detailed troubleshooting, see our [debugging guide](debugging.qmd).
    +## Training Instability + +- Start with DeepSpeed ZeRO-2 +- Monitor loss values +- Check learning rates + +::: + +For more detailed troubleshooting, see our [debugging guide](debugging.qmd). diff --git a/examples/colab-notebooks/colab-axolotl-example.html b/examples/colab-notebooks/colab-axolotl-example.html index b21d2946b..7210cdbd9 100644 --- a/examples/colab-notebooks/colab-axolotl-example.html +++ b/examples/colab-notebooks/colab-axolotl-example.html @@ -567,7 +567,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
    %%capture
     # This step can take ~5-10 minutes to install dependencies
     !pip install --no-build-isolation axolotl[flash-attn]>=0.9.1
    -!pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"
    +!pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"

    Demo: Talk Like a Pirate

    diff --git a/index.html b/index.html index 72886705f..8a4344dce 100644 --- a/index.html +++ b/index.html @@ -564,6 +564,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

    🎉 Latest Updates

      +
    • 2025/11: Axolotl now includes support for Olmo3.
    • 2025/10: New model support has been added in Axolotl for: Qwen3 Next, Qwen2.5-vl, Qwen3-vl, Qwen3, Qwen3MoE, Granite 4, HunYuan, Magistral 2509, Apertus, and Seed-OSS.
    • 2025/09: Axolotl now has text diffusion training. Read more here.
    • 2025/08: QAT has been updated to include NVFP4 support. See PR.
    • diff --git a/search.json b/search.json index 596794572..715685639 100644 --- a/search.json +++ b/search.json @@ -501,8 +501,8 @@ "objectID": "docs/multi-gpu.html#sec-overview", "href": "docs/multi-gpu.html#sec-overview", "title": "Multi-GPU", - "section": "1 Overview", - "text": "1 Overview\nAxolotl supports several methods for multi-GPU training:\n\nDeepSpeed (recommended)\nFSDP (Fully Sharded Data Parallel)\nSequence parallelism\nFSDP + QLoRA", + "section": "Overview", + "text": "Overview\nWhen training on multiple GPUs, Axolotl supports 3 sharding/parallelism strategies. Additionally, you can layer specific optimization features on top of that strategy.\nYou generally cannot combine these strategies; they are mutually exclusive.\n\nDeepSpeed: Powerful optimization library, supports ZeRO stages 1-3.\nFSDP (Fully Sharded Data Parallel): PyTorch’s native sharding implementation (Recommended).\nDDP (Distributed Data Parallel): PyTorch’s native parallelism implementation (Default if neither of the above are selected).\n\nThese features can often be combined with the strategies above:\n\nSequence Parallelism: Splits long sequences across GPUs (Compatible with DDP, DeepSpeed, and FSDP).\nFSDP + QLoRA: Combines 4-bit quantization with FSDP (Specific to FSDP).", "crumbs": [ "Deployments", "Multi-GPU" @@ -512,8 +512,8 @@ "objectID": "docs/multi-gpu.html#sec-deepspeed", "href": "docs/multi-gpu.html#sec-deepspeed", "title": "Multi-GPU", - "section": "2 DeepSpeed", - "text": "2 DeepSpeed\n\n2.1 Configuration\nAdd to your YAML config:\ndeepspeed: deepspeed_configs/zero1.json\n\n\n2.2 Usage\n# Fetch deepspeed configs (if not already present)\naxolotl fetch deepspeed_configs\n\n# Passing arg via config\naxolotl train config.yml\n\n# Passing arg via cli\naxolotl train config.yml --deepspeed deepspeed_configs/zero1.json\n\n\n2.3 ZeRO Stages\nWe provide default configurations for:\n\nZeRO Stage 1 (zero1.json)\nZeRO Stage 1 with torch compile (zero1_torch_compile.json)\nZeRO Stage 2 (zero2.json)\nZeRO Stage 3 (zero3.json)\nZeRO Stage 3 with bf16 (zero3_bf16.json)\nZeRO Stage 3 with bf16 and CPU offload params(zero3_bf16_cpuoffload_params.json)\nZeRO Stage 3 with bf16 and CPU offload params and optimizer (zero3_bf16_cpuoffload_all.json)\n\n\n\n\n\n\n\nTip\n\n\n\nChoose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.\nStart from Stage 1 -> Stage 2 -> Stage 3.", + "section": "DeepSpeed", + "text": "DeepSpeed\n\nConfiguration\nAdd to your YAML config:\ndeepspeed: deepspeed_configs/zero1.json\n\n\nUsage\n# Fetch deepspeed configs (if not already present)\naxolotl fetch deepspeed_configs\n\n# Passing arg via config\naxolotl train config.yml\n\n# Passing arg via cli\naxolotl train config.yml --deepspeed deepspeed_configs/zero1.json\n\n\nZeRO Stages\nWe provide default configurations for:\n\nZeRO Stage 1 (zero1.json)\nZeRO Stage 1 with torch compile (zero1_torch_compile.json)\nZeRO Stage 2 (zero2.json)\nZeRO Stage 3 (zero3.json)\nZeRO Stage 3 with bf16 (zero3_bf16.json)\nZeRO Stage 3 with bf16 and CPU offload params(zero3_bf16_cpuoffload_params.json)\nZeRO Stage 3 with bf16 and CPU offload params and optimizer (zero3_bf16_cpuoffload_all.json)\n\n\n\n\n\n\n\nTip\n\n\n\nChoose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.\nStart from Stage 1 -> Stage 2 -> Stage 3.", "crumbs": [ "Deployments", "Multi-GPU" @@ -523,8 +523,8 @@ "objectID": "docs/multi-gpu.html#sec-fsdp", "href": "docs/multi-gpu.html#sec-fsdp", "title": "Multi-GPU", - "section": "3 Fully Sharded Data Parallel (FSDP)", - "text": "3 Fully Sharded Data Parallel (FSDP)\n\n\n\n\n\n\nNote\n\n\n\nFSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.\n\n\n\n3.1 Migrating from FSDP1 to FSDP2\nTo migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and\nalso follow the config field mapping below to update field names.\n\n3.1.1 Config mapping\n\n\n\nFSDP1\nFSDP2\n\n\n\n\nfsdp_sharding_strategy\nreshard_after_forward\n\n\nfsdp_backward_prefetch_policy\nREMOVED\n\n\nfsdp_backward_prefetch\nREMOVED\n\n\nfsdp_forward_prefetch\nREMOVED\n\n\nfsdp_sync_module_states\nREMOVED\n\n\nfsdp_cpu_ram_efficient_loading\ncpu_ram_efficient_loading\n\n\nfsdp_state_dict_type\nstate_dict_type\n\n\nfsdp_use_orig_params\nREMOVED\n\n\nfsdp_activation_checkpointing\nactivation_checkpointing\n\n\n\nFor more details, please see the migration guide in the torchtitan repo. In Axolotl,\nif you were using the following FSDP1 config:\nfsdp_version: 1\nfsdp_config:\n fsdp_offload_params: false\n fsdp_cpu_ram_efficient_loading: true\n fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP\n fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_sharding_strategy: FULL_SHARD\nYou can migrate to the following FSDP2 config:\nfsdp_version: 2\nfsdp_config:\n offload_params: false\n cpu_ram_efficient_loading: true\n auto_wrap_policy: TRANSFORMER_BASED_WRAP\n transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n state_dict_type: FULL_STATE_DICT\n reshard_after_forward: true\n\n\n\n3.2 FSDP1 (deprecated)\n\n\n\n\n\n\nNote\n\n\n\nUsing fsdp to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use fsdp_config as above instead.\n\n\nfsdp:\n - full_shard\n - auto_wrap\nfsdp_config:\n fsdp_offload_params: true\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer", + "section": "Fully Sharded Data Parallel (FSDP)", + "text": "Fully Sharded Data Parallel (FSDP)\nFSDP allows you to shard model parameters, gradients, and optimizer states across data parallel workers.\n\n\n\n\n\n\nNote\n\n\n\nFSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.\n\n\n\nFSDP + QLoRA\nFor combining FSDP with QLoRA, see our dedicated guide.\n\n\nMigrating from FSDP1 to FSDP2\nTo migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and\nalso follow the config field mapping below to update field names.\n\nConfig mapping\n\n\n\nFSDP1\nFSDP2\n\n\n\n\nfsdp_sharding_strategy\nreshard_after_forward\n\n\nfsdp_backward_prefetch_policy\nREMOVED\n\n\nfsdp_backward_prefetch\nREMOVED\n\n\nfsdp_forward_prefetch\nREMOVED\n\n\nfsdp_sync_module_states\nREMOVED\n\n\nfsdp_cpu_ram_efficient_loading\ncpu_ram_efficient_loading\n\n\nfsdp_state_dict_type\nstate_dict_type\n\n\nfsdp_use_orig_params\nREMOVED\n\n\nfsdp_activation_checkpointing\nactivation_checkpointing\n\n\n\nFor more details, please see the migration guide in the torchtitan repo. In Axolotl,\nif you were using the following FSDP1 config:\nfsdp_version: 1\nfsdp_config:\n fsdp_offload_params: false\n fsdp_cpu_ram_efficient_loading: true\n fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP\n fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_sharding_strategy: FULL_SHARD\nYou can migrate to the following FSDP2 config:\nfsdp_version: 2\nfsdp_config:\n offload_params: false\n cpu_ram_efficient_loading: true\n auto_wrap_policy: TRANSFORMER_BASED_WRAP\n transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n state_dict_type: FULL_STATE_DICT\n reshard_after_forward: true\n\n\n\nFSDP1 (deprecated)\n\n\n\n\n\n\nNote\n\n\n\nUsing fsdp to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use fsdp_config as above instead.\n\n\nfsdp:\n - full_shard\n - auto_wrap\nfsdp_config:\n fsdp_offload_params: true\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer", "crumbs": [ "Deployments", "Multi-GPU" @@ -534,8 +534,8 @@ "objectID": "docs/multi-gpu.html#sec-sequence-parallelism", "href": "docs/multi-gpu.html#sec-sequence-parallelism", "title": "Multi-GPU", - "section": "4 Sequence parallelism", - "text": "4 Sequence parallelism\nWe support sequence parallelism (SP) via the\nring-flash-attention project. This\nallows one to split up sequences across GPUs, which is useful in the event that a\nsingle sequence causes OOM errors during model training.\nSee our dedicated guide for more information.\n\n4.1 FSDP + QLoRA\nFor combining FSDP with QLoRA, see our dedicated guide.", + "section": "Sequence parallelism", + "text": "Sequence parallelism\nWe support sequence parallelism (SP) via the\nring-flash-attention project. This\nallows one to split up sequences across GPUs, which is useful in the event that a\nsingle sequence causes OOM errors during model training.\nSee our dedicated guide for more information.", "crumbs": [ "Deployments", "Multi-GPU" @@ -545,8 +545,8 @@ "objectID": "docs/multi-gpu.html#sec-performance", "href": "docs/multi-gpu.html#sec-performance", "title": "Multi-GPU", - "section": "5 Performance Optimization", - "text": "5 Performance Optimization\n\n5.1 Liger Kernel Integration\nPlease see docs for more info.", + "section": "Performance Optimization", + "text": "Performance Optimization\n\nLiger Kernel Integration\nPlease see docs for more info.", "crumbs": [ "Deployments", "Multi-GPU" @@ -556,8 +556,8 @@ "objectID": "docs/multi-gpu.html#sec-troubleshooting", "href": "docs/multi-gpu.html#sec-troubleshooting", "title": "Multi-GPU", - "section": "6 Troubleshooting", - "text": "6 Troubleshooting\n\n6.1 NCCL Issues\nFor NCCL-related problems, see our NCCL troubleshooting guide.\n\n\n6.2 Common Problems\n\nMemory IssuesTraining Instability\n\n\n\nReduce micro_batch_size\nReduce eval_batch_size\nAdjust gradient_accumulation_steps\nConsider using a higher ZeRO stage\n\n\n\n\nStart with DeepSpeed ZeRO-2\nMonitor loss values\nCheck learning rates\n\n\n\n\nFor more detailed troubleshooting, see our debugging guide.", + "section": "Troubleshooting", + "text": "Troubleshooting\n\nNCCL Issues\nFor NCCL-related problems, see our NCCL troubleshooting guide.\n\n\nCommon Problems\n\nMemory IssuesTraining Instability\n\n\n\nReduce micro_batch_size\nReduce eval_batch_size\nAdjust gradient_accumulation_steps\nConsider using a higher ZeRO stage\n\n\n\n\nStart with DeepSpeed ZeRO-2\nMonitor loss values\nCheck learning rates\n\n\n\n\nFor more detailed troubleshooting, see our debugging guide.", "crumbs": [ "Deployments", "Multi-GPU" @@ -1910,7 +1910,7 @@ "href": "docs/custom_integrations.html#cut-cross-entropy", "title": "Custom Integrations", "section": "Cut Cross Entropy", - "text": "Cut Cross Entropy\nCut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.\nSee https://github.com/apple/ml-cross-entropy\n\nRequirements\n\nPyTorch 2.4.0 or higher\n\n\n\nInstallation\nRun the following command to install cut_cross_entropy[transformers] if you don’t have it already.\n\nIf you are in dev environment\n\npython scripts/cutcrossentropy_install.py | sh\n\nIf you are installing from pip\n\npip3 uninstall -y cut-cross-entropy && pip3 install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec\"\n\n\nUsage\nplugins:\n - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin\n\n\nSupported Models\n\napertus\narcee\ncohere\ncohere2\ndeepseek_v3\ngemma\ngemma2\ngemma3\ngemma3_text\ngemma3n\ngemma3n_text\nglm\nglm4\nglm4_moe\nglm4v\nglm4v_moe\ngpt_oss\ngranite\ngranitemoe\ngranitemoeshared\ngranitemoehybrid\nhunyuan_v1_dense\nhunyuan_v1_moe\nlfm2\nlfm2_moe\nlfm2_vl\nllama\nllama4\nllama4_text\nllava\nmistral\nmistral3\nmixtral\nmllama\nphi\nphi3\nphi4_multimodal\nqwen2\nqwen2_vl\nqwen2_moe\nqwen2_5_vl\nqwen3\nqwen3_moe\nqwen3_vl\nqwen3_vl_moe\nqwen3_next\nsmollm3\nseed_oss\nvoxtral\n\n\n\nCitation\n@article{wijmans2024cut,\n author = {Erik Wijmans and\n Brody Huval and\n Alexander Hertzberg and\n Vladlen Koltun and\n Philipp Kr\\\"ahenb\\\"uhl},\n title = {Cut Your Losses in Large-Vocabulary Language Models},\n journal = {arXiv},\n year = {2024},\n url = {https://arxiv.org/abs/2411.09009},\n}\nPlease see reference here", + "text": "Cut Cross Entropy\nCut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.\nSee https://github.com/apple/ml-cross-entropy\n\nRequirements\n\nPyTorch 2.4.0 or higher\n\n\n\nInstallation\nRun the following command to install cut_cross_entropy[transformers] if you don’t have it already.\n\nIf you are in dev environment\n\npython scripts/cutcrossentropy_install.py | sh\n\nIf you are installing from pip\n\npip3 uninstall -y cut-cross-entropy && pip3 install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953\"\n\n\nUsage\nplugins:\n - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin\n\n\nSupported Models\n\napertus\narcee\ncohere\ncohere2\ndeepseek_v3\ngemma\ngemma2\ngemma3\ngemma3_text\ngemma3n\ngemma3n_text\nglm\nglm4\nglm4_moe\nglm4v\nglm4v_moe\ngpt_oss\ngranite\ngranitemoe\ngranitemoeshared\ngranitemoehybrid\nhunyuan_v1_dense\nhunyuan_v1_moe\nlfm2\nlfm2_moe\nlfm2_vl\nllama\nllama4\nllama4_text\nllava\nmistral\nmistral3\nmixtral\nmllama\nolmo\nolmo2\nolmo3\nphi\nphi3\nphi4_multimodal\nqwen2\nqwen2_vl\nqwen2_moe\nqwen2_5_vl\nqwen3\nqwen3_moe\nqwen3_vl\nqwen3_vl_moe\nqwen3_next\nsmollm3\nseed_oss\nvoxtral\n\n\n\nCitation\n@article{wijmans2024cut,\n author = {Erik Wijmans and\n Brody Huval and\n Alexander Hertzberg and\n Vladlen Koltun and\n Philipp Kr\\\"ahenb\\\"uhl},\n title = {Cut Your Losses in Large-Vocabulary Language Models},\n journal = {arXiv},\n year = {2024},\n url = {https://arxiv.org/abs/2411.09009},\n}\nPlease see reference here", "crumbs": [ "Advanced Features", "Custom Integrations" @@ -2030,7 +2030,7 @@ "href": "index.html#latest-updates", "title": "Axolotl", "section": "🎉 Latest Updates", - "text": "🎉 Latest Updates\n\n2025/10: New model support has been added in Axolotl for: Qwen3 Next, Qwen2.5-vl, Qwen3-vl, Qwen3, Qwen3MoE, Granite 4, HunYuan, Magistral 2509, Apertus, and Seed-OSS.\n2025/09: Axolotl now has text diffusion training. Read more here.\n2025/08: QAT has been updated to include NVFP4 support. See PR.\n2025/07:\n\nND Parallelism support has been added into Axolotl. Compose Context Parallelism (CP), Tensor Parallelism (TP), and Fully Sharded Data Parallelism (FSDP) within a single node and across multiple nodes. Check out the blog post for more info.\nAxolotl adds more models: GPT-OSS, Gemma 3n, Liquid Foundation Model 2 (LFM2), and Arcee Foundation Models (AFM).\nFP8 finetuning with fp8 gather op is now possible in Axolotl via torchao. Get started here!\nVoxtral, Magistral 1.1, and Devstral with mistral-common tokenizer support has been integrated in Axolotl!\nTiledMLP support for single-GPU to multi-GPU training with DDP, DeepSpeed and FSDP support has been added to support Arctic Long Sequence Training. (ALST). See examples for using ALST with Axolotl!\n\n2025/05: Quantization Aware Training (QAT) support has been added to Axolotl. Explore the docs to learn more!\n\n\n\nExpand older updates\n\n\n2025/03: Axolotl has implemented Sequence Parallelism (SP) support. Read the blog and docs to learn how to scale your context length when fine-tuning.\n2025/06: Magistral with mistral-common tokenizer support has been added to Axolotl. See examples to start training your own Magistral models with Axolotl!\n2025/04: Llama 4 support has been added in Axolotl. See examples to start training your own Llama 4 models with Axolotl’s linearized version!\n2025/03: (Beta) Fine-tuning Multimodal models is now supported in Axolotl. Check out the docs to fine-tune your own!\n2025/02: Axolotl has added LoRA optimizations to reduce memory usage and improve training speed for LoRA and QLoRA in single GPU and multi-GPU training (DDP and DeepSpeed). Jump into the docs to give it a try.\n2025/02: Axolotl has added GRPO support. Dive into our blog and GRPO example and have some fun!\n2025/01: Axolotl has added Reward Modelling / Process Reward Modelling fine-tuning support. See docs.", + "text": "🎉 Latest Updates\n\n2025/11: Axolotl now includes support for Olmo3.\n2025/10: New model support has been added in Axolotl for: Qwen3 Next, Qwen2.5-vl, Qwen3-vl, Qwen3, Qwen3MoE, Granite 4, HunYuan, Magistral 2509, Apertus, and Seed-OSS.\n2025/09: Axolotl now has text diffusion training. Read more here.\n2025/08: QAT has been updated to include NVFP4 support. See PR.\n2025/07:\n\nND Parallelism support has been added into Axolotl. Compose Context Parallelism (CP), Tensor Parallelism (TP), and Fully Sharded Data Parallelism (FSDP) within a single node and across multiple nodes. Check out the blog post for more info.\nAxolotl adds more models: GPT-OSS, Gemma 3n, Liquid Foundation Model 2 (LFM2), and Arcee Foundation Models (AFM).\nFP8 finetuning with fp8 gather op is now possible in Axolotl via torchao. Get started here!\nVoxtral, Magistral 1.1, and Devstral with mistral-common tokenizer support has been integrated in Axolotl!\nTiledMLP support for single-GPU to multi-GPU training with DDP, DeepSpeed and FSDP support has been added to support Arctic Long Sequence Training. (ALST). See examples for using ALST with Axolotl!\n\n2025/05: Quantization Aware Training (QAT) support has been added to Axolotl. Explore the docs to learn more!\n\n\n\nExpand older updates\n\n\n2025/03: Axolotl has implemented Sequence Parallelism (SP) support. Read the blog and docs to learn how to scale your context length when fine-tuning.\n2025/06: Magistral with mistral-common tokenizer support has been added to Axolotl. See examples to start training your own Magistral models with Axolotl!\n2025/04: Llama 4 support has been added in Axolotl. See examples to start training your own Llama 4 models with Axolotl’s linearized version!\n2025/03: (Beta) Fine-tuning Multimodal models is now supported in Axolotl. Check out the docs to fine-tune your own!\n2025/02: Axolotl has added LoRA optimizations to reduce memory usage and improve training speed for LoRA and QLoRA in single GPU and multi-GPU training (DDP and DeepSpeed). Jump into the docs to give it a try.\n2025/02: Axolotl has added GRPO support. Dive into our blog and GRPO example and have some fun!\n2025/01: Axolotl has added Reward Modelling / Process Reward Modelling fine-tuning support. See docs.", "crumbs": [ "Home" ] diff --git a/sitemap.xml b/sitemap.xml index 3676d3a95..a0fc74af7 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,802 +2,802 @@ https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-11-20T14:26:32.771Z + 2025-11-24T03:21:38.598Z https://docs.axolotl.ai/docs/mac.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/cli.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/mixed_precision.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/installation.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.576Z https://docs.axolotl.ai/docs/optimizations.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/gradient_checkpointing.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.572Z https://docs.axolotl.ai/docs/docker.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.572Z https://docs.axolotl.ai/docs/input_output.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/rlhf.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/multi-node.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-11-20T14:26:32.743Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/nd_parallelism.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.572Z https://docs.axolotl.ai/docs/quantize.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-11-20T14:33:28.923Z + 2025-11-24T03:25:24.600Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-11-20T14:33:29.835Z + 2025-11-24T03:25:25.521Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-11-20T14:33:30.350Z + 2025-11-24T03:25:26.029Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-11-20T14:33:30.341Z + 2025-11-24T03:25:26.019Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-11-20T14:33:28.922Z + 2025-11-24T03:25:24.598Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-11-20T14:33:30.479Z + 2025-11-24T03:25:26.158Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-11-20T14:33:30.484Z + 2025-11-24T03:25:26.162Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-11-20T14:33:29.119Z + 2025-11-24T03:25:24.795Z https://docs.axolotl.ai/docs/api/cli.utils.load.html - 2025-11-20T14:33:29.174Z + 2025-11-24T03:25:24.853Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-11-20T14:33:29.083Z + 2025-11-24T03:25:24.761Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-11-20T14:33:29.026Z + 2025-11-24T03:25:24.704Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-11-20T14:33:28.844Z + 2025-11-24T03:25:24.522Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-11-20T14:33:29.133Z + 2025-11-24T03:25:24.811Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-11-20T14:33:29.846Z + 2025-11-24T03:25:25.532Z https://docs.axolotl.ai/docs/api/loaders.constants.html - 2025-11-20T14:33:29.323Z + 2025-11-24T03:25:25.000Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-11-20T14:33:29.485Z + 2025-11-24T03:25:25.165Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-11-20T14:33:29.233Z + 2025-11-24T03:25:24.910Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-11-20T14:33:30.386Z + 2025-11-24T03:25:26.065Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-11-20T14:33:29.821Z + 2025-11-24T03:25:25.507Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-11-20T14:33:29.429Z + 2025-11-24T03:25:25.108Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-11-20T14:33:29.499Z + 2025-11-24T03:25:25.179Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-11-20T14:33:29.921Z + 2025-11-24T03:25:25.607Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-11-20T14:33:28.856Z + 2025-11-24T03:25:24.534Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-11-20T14:33:29.825Z + 2025-11-24T03:25:25.511Z https://docs.axolotl.ai/docs/api/cli.art.html - 2025-11-20T14:33:29.030Z + 2025-11-24T03:25:24.708Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-11-20T14:33:30.416Z + 2025-11-24T03:25:26.094Z https://docs.axolotl.ai/docs/api/cli.utils.train.html - 2025-11-20T14:33:29.196Z + 2025-11-24T03:25:24.874Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-11-20T14:33:30.335Z + 2025-11-24T03:25:26.014Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-11-20T14:33:29.248Z + 2025-11-24T03:25:24.925Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-11-20T14:33:29.038Z + 2025-11-24T03:25:24.716Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-11-20T14:33:30.364Z + 2025-11-24T03:25:26.043Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-11-20T14:33:29.911Z + 2025-11-24T03:25:25.597Z https://docs.axolotl.ai/docs/api/utils.data.streaming.html - 2025-11-20T14:33:30.019Z + 2025-11-24T03:25:25.704Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-11-20T14:33:29.894Z + 2025-11-24T03:25:25.580Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-11-20T14:33:29.445Z + 2025-11-24T03:25:25.124Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2025-11-20T14:33:29.279Z + 2025-11-24T03:25:24.956Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-11-20T14:33:29.552Z + 2025-11-24T03:25:25.233Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-11-20T14:33:29.002Z + 2025-11-24T03:25:24.680Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-11-20T14:33:29.764Z + 2025-11-24T03:25:25.450Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-11-20T14:33:29.550Z + 2025-11-24T03:25:25.231Z https://docs.axolotl.ai/docs/api/cli.utils.fetch.html - 2025-11-20T14:33:29.167Z + 2025-11-24T03:25:24.846Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-11-20T14:33:30.107Z + 2025-11-24T03:25:25.791Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-11-20T14:33:30.385Z + 2025-11-24T03:25:26.064Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-11-20T14:33:30.490Z + 2025-11-24T03:25:26.169Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-11-20T14:33:29.576Z + 2025-11-24T03:25:25.256Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-11-20T14:33:30.076Z + 2025-11-24T03:25:25.760Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-11-20T14:33:29.756Z + 2025-11-24T03:25:25.441Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-11-20T14:33:29.814Z + 2025-11-24T03:25:25.500Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-11-20T14:33:29.214Z + 2025-11-24T03:25:24.892Z https://docs.axolotl.ai/docs/api/cli.utils.args.html - 2025-11-20T14:33:29.161Z + 2025-11-24T03:25:24.839Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-11-20T14:33:30.122Z + 2025-11-24T03:25:25.805Z https://docs.axolotl.ai/docs/api/core.builders.base.html - 2025-11-20T14:33:28.864Z + 2025-11-24T03:25:24.541Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-11-20T14:33:29.754Z + 2025-11-24T03:25:25.439Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-11-20T14:33:29.145Z + 2025-11-24T03:25:24.823Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-11-20T14:33:30.017Z + 2025-11-24T03:25:25.702Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-11-20T14:33:30.363Z + 2025-11-24T03:25:26.041Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-11-20T14:33:28.982Z + 2025-11-24T03:25:24.660Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-11-20T14:33:29.712Z + 2025-11-24T03:25:25.394Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-11-20T14:33:29.907Z + 2025-11-24T03:25:25.593Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-11-20T14:33:30.000Z + 2025-11-24T03:25:25.686Z https://docs.axolotl.ai/docs/api/loaders.model.html - 2025-11-20T14:33:29.291Z + 2025-11-24T03:25:24.967Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-11-20T14:33:29.758Z + 2025-11-24T03:25:25.443Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-11-20T14:33:28.769Z + 2025-11-24T03:25:24.446Z https://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html - 2025-11-20T14:33:29.066Z + 2025-11-24T03:25:24.744Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-11-20T14:33:29.941Z + 2025-11-24T03:25:25.628Z https://docs.axolotl.ai/docs/api/cli.quantize.html - 2025-11-20T14:33:29.125Z + 2025-11-24T03:25:24.801Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-11-20T14:33:30.366Z + 2025-11-24T03:25:26.045Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-11-20T14:33:29.471Z + 2025-11-24T03:25:25.150Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-11-20T14:33:30.421Z + 2025-11-24T03:25:26.100Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-11-20T14:33:28.931Z + 2025-11-24T03:25:24.608Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-11-20T14:33:29.608Z + 2025-11-24T03:25:25.288Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2025-11-20T14:33:29.239Z + 2025-11-24T03:25:24.917Z https://docs.axolotl.ai/docs/api/core.builders.causal.html - 2025-11-20T14:33:28.869Z + 2025-11-24T03:25:24.547Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-11-20T14:33:29.724Z + 2025-11-24T03:25:25.407Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-11-20T14:33:30.164Z + 2025-11-24T03:25:25.848Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/index.html - 2025-11-20T14:26:32.766Z + 2025-11-24T03:21:38.593Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-11-20T14:26:32.753Z + 2025-11-24T03:21:38.580Z https://docs.axolotl.ai/FAQS.html - 2025-11-20T14:26:32.742Z + 2025-11-24T03:21:38.569Z https://docs.axolotl.ai/docs/inference.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-11-20T14:33:28.941Z + 2025-11-24T03:25:24.618Z https://docs.axolotl.ai/docs/api/train.html - 2025-11-20T14:33:28.756Z + 2025-11-24T03:25:24.433Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-11-20T14:33:30.084Z + 2025-11-24T03:25:25.769Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-11-20T14:33:29.746Z + 2025-11-24T03:25:25.429Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-11-20T14:33:30.007Z + 2025-11-24T03:25:25.692Z https://docs.axolotl.ai/docs/api/utils.quantization.html - 2025-11-20T14:33:30.050Z + 2025-11-24T03:25:25.736Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-11-20T14:33:30.411Z + 2025-11-24T03:25:26.090Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-11-20T14:33:29.760Z + 2025-11-24T03:25:25.445Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-11-20T14:33:29.060Z + 2025-11-24T03:25:24.738Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2025-11-20T14:33:29.330Z + 2025-11-24T03:25:25.007Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-11-20T14:33:29.900Z + 2025-11-24T03:25:25.586Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2025-11-20T14:33:29.277Z + 2025-11-24T03:25:24.954Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-11-20T14:33:29.503Z + 2025-11-24T03:25:25.183Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-11-20T14:33:29.848Z + 2025-11-24T03:25:25.534Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-11-20T14:33:29.537Z + 2025-11-24T03:25:25.218Z https://docs.axolotl.ai/docs/api/index.html - 2025-11-20T14:33:28.677Z + 2025-11-24T03:25:24.353Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2025-11-20T14:33:29.371Z + 2025-11-24T03:25:25.049Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-11-20T14:33:29.147Z + 2025-11-24T03:25:24.825Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-11-20T14:33:29.094Z + 2025-11-24T03:25:24.771Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-11-20T14:33:29.748Z + 2025-11-24T03:25:25.431Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-11-20T14:33:29.373Z + 2025-11-24T03:25:25.051Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-11-20T14:33:30.494Z + 2025-11-24T03:25:26.173Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-11-20T14:33:30.128Z + 2025-11-24T03:25:25.812Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-11-20T14:33:29.413Z + 2025-11-24T03:25:25.091Z https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html - 2025-11-20T14:33:30.502Z + 2025-11-24T03:25:26.181Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-11-20T14:33:30.471Z + 2025-11-24T03:25:26.150Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-11-20T14:33:29.603Z + 2025-11-24T03:25:25.283Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-11-20T14:33:28.925Z + 2025-11-24T03:25:24.602Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-11-20T14:33:30.388Z + 2025-11-24T03:25:26.067Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-11-20T14:33:30.358Z + 2025-11-24T03:25:26.037Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-11-20T14:33:29.262Z + 2025-11-24T03:25:24.939Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-11-20T14:33:29.525Z + 2025-11-24T03:25:25.205Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-11-20T14:33:29.812Z + 2025-11-24T03:25:25.498Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-11-20T14:33:30.157Z + 2025-11-24T03:25:25.841Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-11-20T14:33:30.067Z + 2025-11-24T03:25:25.752Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-11-20T14:33:29.556Z + 2025-11-24T03:25:25.237Z https://docs.axolotl.ai/docs/api/convert.html - 2025-11-20T14:33:28.793Z + 2025-11-24T03:25:24.470Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-11-20T14:33:30.146Z + 2025-11-24T03:25:25.830Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-11-20T14:33:29.566Z + 2025-11-24T03:25:25.247Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-11-20T14:33:29.554Z + 2025-11-24T03:25:25.235Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2025-11-20T14:33:29.342Z + 2025-11-24T03:25:25.020Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-11-20T14:33:28.920Z + 2025-11-24T03:25:24.596Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-11-20T14:33:29.455Z + 2025-11-24T03:25:25.134Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-11-20T14:33:29.517Z + 2025-11-24T03:25:25.197Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-11-20T14:33:29.801Z + 2025-11-24T03:25:25.486Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-11-20T14:33:29.490Z + 2025-11-24T03:25:25.170Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-11-20T14:33:29.478Z + 2025-11-24T03:25:25.158Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2025-11-20T14:33:29.334Z + 2025-11-24T03:25:25.011Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-11-20T14:33:28.992Z + 2025-11-24T03:25:24.670Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-11-20T14:33:29.109Z + 2025-11-24T03:25:24.785Z https://docs.axolotl.ai/docs/api/loaders.adapter.html - 2025-11-20T14:33:29.309Z + 2025-11-24T03:25:24.986Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-11-20T14:33:28.890Z + 2025-11-24T03:25:24.568Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-11-20T14:33:30.485Z + 2025-11-24T03:25:26.164Z https://docs.axolotl.ai/docs/api/loaders.processor.html - 2025-11-20T14:33:29.303Z + 2025-11-24T03:25:24.979Z https://docs.axolotl.ai/docs/api/core.builders.rl.html - 2025-11-20T14:33:28.875Z + 2025-11-24T03:25:24.552Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-11-20T14:33:29.811Z + 2025-11-24T03:25:25.496Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-11-20T14:33:29.578Z + 2025-11-24T03:25:25.258Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-11-20T14:33:30.340Z + 2025-11-24T03:25:26.018Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-11-20T14:33:29.737Z + 2025-11-24T03:25:25.420Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html - 2025-11-20T14:33:29.884Z + 2025-11-24T03:25:25.570Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-11-20T14:33:29.137Z + 2025-11-24T03:25:24.815Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-11-20T14:33:29.431Z + 2025-11-24T03:25:25.109Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-11-20T14:33:29.511Z + 2025-11-24T03:25:25.191Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-11-20T14:33:30.117Z + 2025-11-24T03:25:25.801Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-11-20T14:33:30.354Z + 2025-11-24T03:25:26.033Z https://docs.axolotl.ai/docs/api/loaders.patch_manager.html - 2025-11-20T14:33:29.322Z + 2025-11-24T03:25:24.998Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-11-20T14:33:29.833Z + 2025-11-24T03:25:25.519Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-11-20T14:33:30.026Z + 2025-11-24T03:25:25.711Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html - 2025-11-20T14:33:29.852Z + 2025-11-24T03:25:25.538Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-11-20T14:33:29.766Z + 2025-11-24T03:25:25.452Z https://docs.axolotl.ai/docs/api/loaders.tokenizer.html - 2025-11-20T14:33:29.301Z + 2025-11-24T03:25:24.977Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-11-20T14:33:29.892Z + 2025-11-24T03:25:25.578Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-11-20T14:33:28.777Z + 2025-11-24T03:25:24.453Z https://docs.axolotl.ai/docs/api/cli.utils.sweeps.html - 2025-11-20T14:33:29.181Z + 2025-11-24T03:25:24.860Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-11-20T14:33:29.975Z + 2025-11-24T03:25:25.661Z https://docs.axolotl.ai/docs/optimizers.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/torchao.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.576Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.572Z https://docs.axolotl.ai/docs/faq.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.572Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/multimodal.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/config-reference.html - 2025-11-20T14:33:45.537Z + 2025-11-24T03:25:41.038Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/debugging.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.572Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-11-20T14:26:32.743Z + 2025-11-24T03:21:38.571Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/streaming.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.576Z https://docs.axolotl.ai/docs/multipack.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/qat.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-11-20T14:26:32.747Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/getting-started.html - 2025-11-20T14:26:32.744Z + 2025-11-24T03:21:38.572Z https://docs.axolotl.ai/docs/nccl.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.575Z https://docs.axolotl.ai/docs/telemetry.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.576Z https://docs.axolotl.ai/docs/unsloth.html - 2025-11-20T14:26:32.748Z + 2025-11-24T03:21:38.576Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-11-20T14:26:32.771Z + 2025-11-24T03:21:38.598Z