diff --git a/.nojekyll b/.nojekyll
index e1b1bdeec..783134bf9 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-4f6c81ae
\ No newline at end of file
+1b4fe270
\ No newline at end of file
diff --git a/docs/multi-gpu.html b/docs/multi-gpu.html
index 0f7eb5bf7..9a16a0e20 100644
--- a/docs/multi-gpu.html
+++ b/docs/multi-gpu.html
@@ -677,6 +677,10 @@ also follow the config field mapping below to update field names.
fsdp_use_orig_params
REMOVED
+
+
fsdp_activation_checkpointing
+
activation_checkpointing
+
For more details, please see the migration guide in the torchtitan repo. In Axolotl,
@@ -1321,98 +1325,99 @@ single sequence causes OOM errors during model training.
fsdp_cpu_ram_efficient_loading | cpu_ram_efficient_loadingfsdp_state_dict_type | state_dict_typefsdp_use_orig_params | **REMOVED**
-
-For more details, please see the migration guide in the [torchtitan repo](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md). In Axolotl,
-if you were using the following FSDP1 config:
-
-```{.yaml}
-fsdp_version:1
-fsdp_config:
-fsdp_offload_params:false
-fsdp_cpu_ram_efficient_loading:true
-fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
-fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer
-fsdp_state_dict_type: FULL_STATE_DICT
-fsdp_sharding_strategy: FULL_SHARD
-```
-
-You can migrate to the following FSDP2 config:
-
-```{.yaml}
-fsdp_version:2
-fsdp_config:
-offload_params:false
-cpu_ram_efficient_loading:true
-auto_wrap_policy: TRANSFORMER_BASED_WRAP
-transformer_layer_cls_to_wrap: Qwen3DecoderLayer
-state_dict_type: FULL_STATE_DICT
-reshard_after_forward:true
-```
-
-### FSDP1 (deprecated) {#sec-fsdp-config}
-
-::: {.callout-note}
-
-Using `fsdp` to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use `fsdp_config` as above instead.
-
-:::
-
-```{.yaml}
-fsdp:
--full_shard
--auto_wrap
-fsdp_config:
-fsdp_offload_params:true
-fsdp_state_dict_type: FULL_STATE_DICT
-fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
-```
-
+fsdp_activation_checkpointing | activation_checkpointing
+
+For more details, please see the migration guide in the [torchtitan repo](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md). In Axolotl,
+if you were using the following FSDP1 config:
+
+```{.yaml}
+fsdp_version:1
+fsdp_config:
+fsdp_offload_params:false
+fsdp_cpu_ram_efficient_loading:true
+fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer
+fsdp_state_dict_type: FULL_STATE_DICT
+fsdp_sharding_strategy: FULL_SHARD
+```
+
+You can migrate to the following FSDP2 config:
+
+```{.yaml}
+fsdp_version:2
+fsdp_config:
+offload_params:false
+cpu_ram_efficient_loading:true
+auto_wrap_policy: TRANSFORMER_BASED_WRAP
+transformer_layer_cls_to_wrap: Qwen3DecoderLayer
+state_dict_type: FULL_STATE_DICT
+reshard_after_forward:true
+```
+
+### FSDP1 (deprecated) {#sec-fsdp-config}
+
+::: {.callout-note}
+
+Using `fsdp` to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use `fsdp_config` as above instead.
+
+:::
+
+```{.yaml}
+fsdp:
+-full_shard
+-auto_wrap
+fsdp_config:
+fsdp_offload_params:true
+fsdp_state_dict_type: FULL_STATE_DICT
+fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
+```
-## Sequence parallelism {#sec-sequence-parallelism}
-
-We support sequence parallelism (SP) via the
-[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This
-allows one to split up sequences across GPUs, which is useful in the event that a
-single sequence causes OOM errors during model training.
-
-See our [dedicated guide](sequence_parallelism.qmd) for more information.
-
-### FSDP + QLoRA {#sec-fsdp-qlora}
-
-For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd).
-
-## Performance Optimization {#sec-performance}
-
-### Liger Kernel Integration {#sec-liger}
-
-Please see [docs](custom_integrations.qmd#liger) for more info.
-
-## Troubleshooting {#sec-troubleshooting}
-
-### NCCL Issues {#sec-nccl}
-
-For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd).
-
-### Common Problems {#sec-common-problems}
-
-::: {.panel-tabset}
-
-## Memory Issues
-
-- Reduce `micro_batch_size`
-- Reduce `eval_batch_size`
-- Adjust `gradient_accumulation_steps`
-- Consider using a higher ZeRO stage
-
-## Training Instability
-
-- Start with DeepSpeed ZeRO-2
-- Monitor loss values
-- Check learning rates
-
-:::
-
-For more detailed troubleshooting, see our [debugging guide](debugging.qmd).
+
+## Sequence parallelism {#sec-sequence-parallelism}
+
+We support sequence parallelism (SP) via the
+[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This
+allows one to split up sequences across GPUs, which is useful in the event that a
+single sequence causes OOM errors during model training.
+
+See our [dedicated guide](sequence_parallelism.qmd) for more information.
+
+### FSDP + QLoRA {#sec-fsdp-qlora}
+
+For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd).
+
+## Performance Optimization {#sec-performance}
+
+### Liger Kernel Integration {#sec-liger}
+
+Please see [docs](custom_integrations.qmd#liger) for more info.
+
+## Troubleshooting {#sec-troubleshooting}
+
+### NCCL Issues {#sec-nccl}
+
+For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd).
+
+### Common Problems {#sec-common-problems}
+
+::: {.panel-tabset}
+
+## Memory Issues
+
+- Reduce `micro_batch_size`
+- Reduce `eval_batch_size`
+- Adjust `gradient_accumulation_steps`
+- Consider using a higher ZeRO stage
+
+## Training Instability
+
+- Start with DeepSpeed ZeRO-2
+- Monitor loss values
+- Check learning rates
+
+:::
+
+For more detailed troubleshooting, see our [debugging guide](debugging.qmd).
diff --git a/search.json b/search.json
index 002f8999c..529717fca 100644
--- a/search.json
+++ b/search.json
@@ -3757,7 +3757,7 @@
"href": "docs/multi-gpu.html#sec-fsdp",
"title": "Multi-GPU",
"section": "3 Fully Sharded Data Parallel (FSDP)",
- "text": "3 Fully Sharded Data Parallel (FSDP)\n\n\n\n\n\n\nNote\n\n\n\nFSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.\n\n\n\n3.1 Migrating from FSDP1 to FSDP2\nTo migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and\nalso follow the config field mapping below to update field names.\n\n3.1.1 Config mapping\n\n\n\nFSDP1\nFSDP2\n\n\n\n\nfsdp_sharding_strategy\nreshard_after_forward\n\n\nfsdp_backward_prefetch_policy\nREMOVED\n\n\nfsdp_backward_prefetch\nREMOVED\n\n\nfsdp_forward_prefetch\nREMOVED\n\n\nfsdp_sync_module_states\nREMOVED\n\n\nfsdp_cpu_ram_efficient_loading\ncpu_ram_efficient_loading\n\n\nfsdp_state_dict_type\nstate_dict_type\n\n\nfsdp_use_orig_params\nREMOVED\n\n\n\nFor more details, please see the migration guide in the torchtitan repo. In Axolotl,\nif you were using the following FSDP1 config:\nfsdp_version: 1\nfsdp_config:\n fsdp_offload_params: false\n fsdp_cpu_ram_efficient_loading: true\n fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP\n fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_sharding_strategy: FULL_SHARD\nYou can migrate to the following FSDP2 config:\nfsdp_version: 2\nfsdp_config:\n offload_params: false\n cpu_ram_efficient_loading: true\n auto_wrap_policy: TRANSFORMER_BASED_WRAP\n transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n state_dict_type: FULL_STATE_DICT\n reshard_after_forward: true\n\n\n\n3.2 FSDP1 (deprecated)\n\n\n\n\n\n\nNote\n\n\n\nUsing fsdp to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use fsdp_config as above instead.\n\n\nfsdp:\n - full_shard\n - auto_wrap\nfsdp_config:\n fsdp_offload_params: true\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer",
+ "text": "3 Fully Sharded Data Parallel (FSDP)\n\n\n\n\n\n\nNote\n\n\n\nFSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.\n\n\n\n3.1 Migrating from FSDP1 to FSDP2\nTo migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and\nalso follow the config field mapping below to update field names.\n\n3.1.1 Config mapping\n\n\n\nFSDP1\nFSDP2\n\n\n\n\nfsdp_sharding_strategy\nreshard_after_forward\n\n\nfsdp_backward_prefetch_policy\nREMOVED\n\n\nfsdp_backward_prefetch\nREMOVED\n\n\nfsdp_forward_prefetch\nREMOVED\n\n\nfsdp_sync_module_states\nREMOVED\n\n\nfsdp_cpu_ram_efficient_loading\ncpu_ram_efficient_loading\n\n\nfsdp_state_dict_type\nstate_dict_type\n\n\nfsdp_use_orig_params\nREMOVED\n\n\nfsdp_activation_checkpointing\nactivation_checkpointing\n\n\n\nFor more details, please see the migration guide in the torchtitan repo. In Axolotl,\nif you were using the following FSDP1 config:\nfsdp_version: 1\nfsdp_config:\n fsdp_offload_params: false\n fsdp_cpu_ram_efficient_loading: true\n fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP\n fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_sharding_strategy: FULL_SHARD\nYou can migrate to the following FSDP2 config:\nfsdp_version: 2\nfsdp_config:\n offload_params: false\n cpu_ram_efficient_loading: true\n auto_wrap_policy: TRANSFORMER_BASED_WRAP\n transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n state_dict_type: FULL_STATE_DICT\n reshard_after_forward: true\n\n\n\n3.2 FSDP1 (deprecated)\n\n\n\n\n\n\nNote\n\n\n\nUsing fsdp to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use fsdp_config as above instead.\n\n\nfsdp:\n - full_shard\n - auto_wrap\nfsdp_config:\n fsdp_offload_params: true\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer",
"crumbs": [
"Deployments",
"Multi-GPU"
diff --git a/sitemap.xml b/sitemap.xml
index d416dd5fa..f5521aff7 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,798 +2,798 @@
https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html
- 2025-10-09T18:18:54.738Z
+ 2025-10-10T12:57:10.844Zhttps://docs.axolotl.ai/docs/mac.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/cli.html
- 2025-10-09T18:18:54.711Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/nccl.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/getting-started.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/lr_groups.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/qat.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/multipack.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/streaming.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.823Zhttps://docs.axolotl.ai/docs/lora_optims.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/amd_hpc.html
- 2025-10-09T18:18:54.711Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/debugging.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/dataset-formats/conversation.html
- 2025-10-09T18:18:54.711Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/dataset-formats/inst_tune.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/dataset-formats/index.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/config-reference.html
- 2025-10-09T18:22:17.616Z
+ 2025-10-10T13:00:50.711Zhttps://docs.axolotl.ai/docs/multimodal.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/ray-integration.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/faq.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/dataset_preprocessing.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/torchao.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.823Zhttps://docs.axolotl.ai/docs/optimizers.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/api/utils.schedulers.html
- 2025-10-09T18:22:02.135Z
+ 2025-10-10T13:00:34.956Zhttps://docs.axolotl.ai/docs/api/cli.utils.sweeps.html
- 2025-10-09T18:22:01.340Z
+ 2025-10-10T13:00:34.170Zhttps://docs.axolotl.ai/docs/api/datasets.html
- 2025-10-09T18:22:00.935Z
+ 2025-10-10T13:00:33.767Zhttps://docs.axolotl.ai/docs/api/utils.tokenization.html
- 2025-10-09T18:22:02.052Z
+ 2025-10-10T13:00:34.873Zhttps://docs.axolotl.ai/docs/api/loaders.tokenizer.html
- 2025-10-09T18:22:01.458Z
+ 2025-10-10T13:00:34.287Zhttps://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html
- 2025-10-09T18:22:01.925Z
+ 2025-10-10T13:00:34.747Zhttps://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html
- 2025-10-09T18:22:02.011Z
+ 2025-10-10T13:00:34.833Zhttps://docs.axolotl.ai/docs/api/utils.data.sft.html
- 2025-10-09T18:22:02.186Z
+ 2025-10-10T13:00:35.007Zhttps://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html
- 2025-10-09T18:22:01.992Z
+ 2025-10-10T13:00:34.814Zhttps://docs.axolotl.ai/docs/api/loaders.patch_manager.html
- 2025-10-09T18:22:01.479Z
+ 2025-10-10T13:00:34.308Zhttps://docs.axolotl.ai/docs/api/integrations.liger.args.html
- 2025-10-09T18:22:02.506Z
+ 2025-10-10T13:00:35.323Zhttps://docs.axolotl.ai/docs/api/utils.schemas.peft.html
- 2025-10-09T18:22:02.275Z
+ 2025-10-10T13:00:35.095Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html
- 2025-10-09T18:22:01.673Z
+ 2025-10-10T13:00:34.499Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html
- 2025-10-09T18:22:01.591Z
+ 2025-10-10T13:00:34.418Zhttps://docs.axolotl.ai/docs/api/cli.cloud.base.html
- 2025-10-09T18:22:01.296Z
+ 2025-10-10T13:00:34.127Zhttps://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html
- 2025-10-09T18:22:02.043Z
+ 2025-10-10T13:00:34.865Zhttps://docs.axolotl.ai/docs/api/kernels.swiglu.html
- 2025-10-09T18:22:01.895Z
+ 2025-10-10T13:00:34.718Zhttps://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html
- 2025-10-09T18:22:02.491Z
+ 2025-10-10T13:00:35.308Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html
- 2025-10-09T18:22:01.733Z
+ 2025-10-10T13:00:34.559Zhttps://docs.axolotl.ai/docs/api/monkeypatch.utils.html
- 2025-10-09T18:22:01.969Z
+ 2025-10-10T13:00:34.792Zhttps://docs.axolotl.ai/docs/api/core.builders.rl.html
- 2025-10-09T18:22:01.033Z
+ 2025-10-10T13:00:33.865Zhttps://docs.axolotl.ai/docs/api/loaders.processor.html
- 2025-10-09T18:22:01.460Z
+ 2025-10-10T13:00:34.289Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html
- 2025-10-09T18:22:02.637Z
+ 2025-10-10T13:00:35.452Zhttps://docs.axolotl.ai/docs/api/core.training_args.html
- 2025-10-09T18:22:01.049Z
+ 2025-10-10T13:00:33.881Zhttps://docs.axolotl.ai/docs/api/loaders.adapter.html
- 2025-10-09T18:22:01.467Z
+ 2025-10-10T13:00:34.296Zhttps://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html
- 2025-10-09T18:22:01.267Z
+ 2025-10-10T13:00:34.098Zhttps://docs.axolotl.ai/docs/api/cli.train.html
- 2025-10-09T18:22:01.151Z
+ 2025-10-10T13:00:33.982Zhttps://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html
- 2025-10-09T18:22:01.492Z
+ 2025-10-10T13:00:34.321Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.completion.html
- 2025-10-09T18:22:01.638Z
+ 2025-10-10T13:00:34.465Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html
- 2025-10-09T18:22:01.651Z
+ 2025-10-10T13:00:34.478Zhttps://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html
- 2025-10-09T18:22:01.960Z
+ 2025-10-10T13:00:34.782Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html
- 2025-10-09T18:22:01.678Z
+ 2025-10-10T13:00:34.504Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html
- 2025-10-09T18:22:01.615Z
+ 2025-10-10T13:00:34.442Zhttps://docs.axolotl.ai/docs/api/core.chat.messages.html
- 2025-10-09T18:22:01.078Z
+ 2025-10-10T13:00:33.909Zhttps://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html
- 2025-10-09T18:22:01.500Z
+ 2025-10-10T13:00:34.329Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html
- 2025-10-09T18:22:01.709Z
+ 2025-10-10T13:00:34.535Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html
- 2025-10-09T18:22:01.721Z
+ 2025-10-10T13:00:34.547Zhttps://docs.axolotl.ai/docs/api/utils.schemas.integrations.html
- 2025-10-09T18:22:02.301Z
+ 2025-10-10T13:00:35.121Zhttps://docs.axolotl.ai/docs/api/convert.html
- 2025-10-09T18:22:00.952Z
+ 2025-10-10T13:00:33.784Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html
- 2025-10-09T18:22:01.711Z
+ 2025-10-10T13:00:34.537Zhttps://docs.axolotl.ai/docs/api/utils.schemas.config.html
- 2025-10-09T18:22:02.225Z
+ 2025-10-10T13:00:35.046Zhttps://docs.axolotl.ai/docs/api/utils.schemas.enums.html
- 2025-10-09T18:22:02.311Z
+ 2025-10-10T13:00:35.132Zhttps://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html
- 2025-10-09T18:22:01.971Z
+ 2025-10-10T13:00:34.793Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html
- 2025-10-09T18:22:01.680Z
+ 2025-10-10T13:00:34.506Zhttps://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html
- 2025-10-09T18:22:01.419Z
+ 2025-10-10T13:00:34.248Zhttps://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html
- 2025-10-09T18:22:02.510Z
+ 2025-10-10T13:00:35.327Zhttps://docs.axolotl.ai/docs/api/utils.collators.core.html
- 2025-10-09T18:22:02.539Z
+ 2025-10-10T13:00:35.356Zhttps://docs.axolotl.ai/docs/api/core.chat.format.shared.html
- 2025-10-09T18:22:01.083Z
+ 2025-10-10T13:00:33.915Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html
- 2025-10-09T18:22:01.759Z
+ 2025-10-10T13:00:34.584Zhttps://docs.axolotl.ai/docs/api/utils.samplers.multipack.html
- 2025-10-09T18:22:02.622Z
+ 2025-10-10T13:00:35.438Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.qat.html
- 2025-10-09T18:22:02.654Z
+ 2025-10-10T13:00:35.469Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html
- 2025-10-09T18:22:01.572Z
+ 2025-10-10T13:00:34.400Zhttps://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html
- 2025-10-09T18:22:02.286Z
+ 2025-10-10T13:00:35.106Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html
- 2025-10-09T18:22:02.646Z
+ 2025-10-10T13:00:35.461Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.base.html
- 2025-10-09T18:22:01.531Z
+ 2025-10-10T13:00:34.360Zhttps://docs.axolotl.ai/docs/api/kernels.utils.html
- 2025-10-09T18:22:01.906Z
+ 2025-10-10T13:00:34.729Zhttps://docs.axolotl.ai/docs/api/cli.merge_lora.html
- 2025-10-09T18:22:01.253Z
+ 2025-10-10T13:00:34.084Zhttps://docs.axolotl.ai/docs/api/cli.utils.html
- 2025-10-09T18:22:01.306Z
+ 2025-10-10T13:00:34.136Zhttps://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html
- 2025-10-09T18:22:01.529Z
+ 2025-10-10T13:00:34.358Zhttps://docs.axolotl.ai/docs/api/index.html
- 2025-10-09T18:22:00.837Z
+ 2025-10-10T13:00:33.670Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html
- 2025-10-09T18:22:01.693Z
+ 2025-10-10T13:00:34.519Zhttps://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html
- 2025-10-09T18:22:02.007Z
+ 2025-10-10T13:00:34.829Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html
- 2025-10-09T18:22:01.664Z
+ 2025-10-10T13:00:34.491Zhttps://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html
- 2025-10-09T18:22:01.433Z
+ 2025-10-10T13:00:34.263Zhttps://docs.axolotl.ai/docs/api/utils.lora.html
- 2025-10-09T18:22:02.060Z
+ 2025-10-10T13:00:34.881Zhttps://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html
- 2025-10-09T18:22:01.488Z
+ 2025-10-10T13:00:34.317Zhttps://docs.axolotl.ai/docs/api/cli.config.html
- 2025-10-09T18:22:01.219Z
+ 2025-10-10T13:00:34.051Zhttps://docs.axolotl.ai/docs/api/monkeypatch.multipack.html
- 2025-10-09T18:22:01.919Z
+ 2025-10-10T13:00:34.741Zhttps://docs.axolotl.ai/docs/api/utils.collators.batching.html
- 2025-10-09T18:22:02.563Z
+ 2025-10-10T13:00:35.379Zhttps://docs.axolotl.ai/docs/api/utils.quantization.html
- 2025-10-09T18:22:02.210Z
+ 2025-10-10T13:00:35.030Zhttps://docs.axolotl.ai/docs/api/utils.dict.html
- 2025-10-09T18:22:02.167Z
+ 2025-10-10T13:00:34.988Zhttps://docs.axolotl.ai/docs/api/kernels.quantize.html
- 2025-10-09T18:22:01.904Z
+ 2025-10-10T13:00:34.727Zhttps://docs.axolotl.ai/docs/api/utils.schemas.training.html
- 2025-10-09T18:22:02.242Z
+ 2025-10-10T13:00:35.063Zhttps://docs.axolotl.ai/docs/api/train.html
- 2025-10-09T18:22:00.915Z
+ 2025-10-10T13:00:33.747Zhttps://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html
- 2025-10-09T18:22:01.099Z
+ 2025-10-10T13:00:33.930Zhttps://docs.axolotl.ai/docs/inference.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/FAQS.html
- 2025-10-09T18:18:54.710Z
+ 2025-10-10T12:57:10.816Zhttps://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html
- 2025-10-09T18:18:54.720Z
+ 2025-10-10T12:57:10.827Zhttps://docs.axolotl.ai/index.html
- 2025-10-09T18:18:54.733Z
+ 2025-10-10T12:57:10.839Zhttps://docs.axolotl.ai/docs/custom_integrations.html
- 2025-10-09T18:18:54.711Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/api/utils.schemas.utils.html
- 2025-10-09T18:22:02.318Z
+ 2025-10-10T13:00:35.138Zhttps://docs.axolotl.ai/docs/api/kernels.geglu.html
- 2025-10-09T18:22:01.883Z
+ 2025-10-10T13:00:34.706Zhttps://docs.axolotl.ai/docs/api/core.builders.causal.html
- 2025-10-09T18:22:01.028Z
+ 2025-10-10T13:00:33.860Zhttps://docs.axolotl.ai/docs/api/core.trainers.mamba.html
- 2025-10-09T18:22:01.397Z
+ 2025-10-10T13:00:34.227Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html
- 2025-10-09T18:22:01.764Z
+ 2025-10-10T13:00:34.589Zhttps://docs.axolotl.ai/docs/api/core.datasets.chat.html
- 2025-10-09T18:22:01.089Z
+ 2025-10-10T13:00:33.921Zhttps://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html
- 2025-10-09T18:22:02.573Z
+ 2025-10-10T13:00:35.389Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html
- 2025-10-09T18:22:01.631Z
+ 2025-10-10T13:00:34.458Zhttps://docs.axolotl.ai/docs/api/common.const.html
- 2025-10-09T18:22:02.518Z
+ 2025-10-10T13:00:35.334Zhttps://docs.axolotl.ai/docs/api/cli.quantize.html
- 2025-10-09T18:22:01.283Z
+ 2025-10-10T13:00:34.114Zhttps://docs.axolotl.ai/docs/api/utils.trainer.html
- 2025-10-09T18:22:02.102Z
+ 2025-10-10T13:00:34.923Zhttps://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html
- 2025-10-09T18:22:01.225Z
+ 2025-10-10T13:00:34.057Zhttps://docs.axolotl.ai/docs/api/evaluate.html
- 2025-10-09T18:22:00.928Z
+ 2025-10-10T13:00:33.760Zhttps://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html
- 2025-10-09T18:22:01.917Z
+ 2025-10-10T13:00:34.739Zhttps://docs.axolotl.ai/docs/api/loaders.model.html
- 2025-10-09T18:22:01.448Z
+ 2025-10-10T13:00:34.277Zhttps://docs.axolotl.ai/docs/api/utils.distributed.html
- 2025-10-09T18:22:02.160Z
+ 2025-10-10T13:00:34.981Zhttps://docs.axolotl.ai/docs/api/utils.model_shard_quant.html
- 2025-10-09T18:22:02.067Z
+ 2025-10-10T13:00:34.888Zhttps://docs.axolotl.ai/docs/api/kernels.lora.html
- 2025-10-09T18:22:01.870Z
+ 2025-10-10T13:00:34.693Zhttps://docs.axolotl.ai/docs/api/cli.main.html
- 2025-10-09T18:22:01.141Z
+ 2025-10-10T13:00:33.972Zhttps://docs.axolotl.ai/docs/api/integrations.spectrum.args.html
- 2025-10-09T18:22:02.514Z
+ 2025-10-10T13:00:35.331Zhttps://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html
- 2025-10-09T18:22:02.176Z
+ 2025-10-10T13:00:34.997Zhttps://docs.axolotl.ai/docs/api/cli.cloud.modal_.html
- 2025-10-09T18:22:01.304Z
+ 2025-10-10T13:00:34.134Zhttps://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html
- 2025-10-09T18:22:01.913Z
+ 2025-10-10T13:00:34.736Zhttps://docs.axolotl.ai/docs/api/core.builders.base.html
- 2025-10-09T18:22:01.022Z
+ 2025-10-10T13:00:33.854Zhttps://docs.axolotl.ai/docs/api/utils.schemas.trl.html
- 2025-10-09T18:22:02.279Z
+ 2025-10-10T13:00:35.099Zhttps://docs.axolotl.ai/docs/api/cli.utils.args.html
- 2025-10-09T18:22:01.320Z
+ 2025-10-10T13:00:34.150Zhttps://docs.axolotl.ai/docs/api/core.trainers.base.html
- 2025-10-09T18:22:01.372Z
+ 2025-10-10T13:00:34.202Zhttps://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html
- 2025-10-09T18:22:01.973Z
+ 2025-10-10T13:00:34.795Zhttps://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html
- 2025-10-09T18:22:01.915Z
+ 2025-10-10T13:00:34.738Zhttps://docs.axolotl.ai/docs/api/utils.schemas.model.html
- 2025-10-09T18:22:02.234Z
+ 2025-10-10T13:00:35.054Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html
- 2025-10-09T18:22:01.732Z
+ 2025-10-10T13:00:34.557Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html
- 2025-10-09T18:22:02.641Z
+ 2025-10-10T13:00:35.457Zhttps://docs.axolotl.ai/docs/api/common.datasets.html
- 2025-10-09T18:22:02.536Z
+ 2025-10-10T13:00:35.353Zhttps://docs.axolotl.ai/docs/api/utils.schemas.datasets.html
- 2025-10-09T18:22:02.264Z
+ 2025-10-10T13:00:35.085Zhttps://docs.axolotl.ai/docs/api/cli.utils.fetch.html
- 2025-10-09T18:22:01.326Z
+ 2025-10-10T13:00:34.157Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html
- 2025-10-09T18:22:01.706Z
+ 2025-10-10T13:00:34.532Zhttps://docs.axolotl.ai/docs/api/monkeypatch.relora.html
- 2025-10-09T18:22:01.923Z
+ 2025-10-10T13:00:34.746Zhttps://docs.axolotl.ai/docs/api/cli.evaluate.html
- 2025-10-09T18:22:01.162Z
+ 2025-10-10T13:00:33.992Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html
- 2025-10-09T18:22:01.707Z
+ 2025-10-10T13:00:34.534Zhttps://docs.axolotl.ai/docs/api/core.trainers.utils.html
- 2025-10-09T18:22:01.435Z
+ 2025-10-10T13:00:34.265Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html
- 2025-10-09T18:22:01.605Z
+ 2025-10-10T13:00:34.433Zhttps://docs.axolotl.ai/docs/api/utils.chat_templates.html
- 2025-10-09T18:22:02.054Z
+ 2025-10-10T13:00:34.875Zhttps://docs.axolotl.ai/docs/api/utils.data.streaming.html
- 2025-10-09T18:22:02.178Z
+ 2025-10-10T13:00:34.999Zhttps://docs.axolotl.ai/docs/api/utils.bench.html
- 2025-10-09T18:22:02.071Z
+ 2025-10-10T13:00:34.893Zhttps://docs.axolotl.ai/docs/api/common.architectures.html
- 2025-10-09T18:22:02.516Z
+ 2025-10-10T13:00:35.333Zhttps://docs.axolotl.ai/docs/api/cli.checks.html
- 2025-10-09T18:22:01.198Z
+ 2025-10-10T13:00:34.029Zhttps://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html
- 2025-10-09T18:22:01.405Z
+ 2025-10-10T13:00:34.235Zhttps://docs.axolotl.ai/docs/api/integrations.base.html
- 2025-10-09T18:22:02.487Z
+ 2025-10-10T13:00:35.304Zhttps://docs.axolotl.ai/docs/api/cli.utils.train.html
- 2025-10-09T18:22:01.355Z
+ 2025-10-10T13:00:34.185Zhttps://docs.axolotl.ai/docs/api/utils.collators.mamba.html
- 2025-10-09T18:22:02.567Z
+ 2025-10-10T13:00:35.383Zhttps://docs.axolotl.ai/docs/api/cli.art.html
- 2025-10-09T18:22:01.190Z
+ 2025-10-10T13:00:34.021Zhttps://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html
- 2025-10-09T18:22:01.984Z
+ 2025-10-10T13:00:34.806Zhttps://docs.axolotl.ai/docs/api/logging_config.html
- 2025-10-09T18:22:01.015Z
+ 2025-10-10T13:00:33.846Zhttps://docs.axolotl.ai/docs/api/utils.freeze.html
- 2025-10-09T18:22:02.081Z
+ 2025-10-10T13:00:34.902Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html
- 2025-10-09T18:22:01.660Z
+ 2025-10-10T13:00:34.486Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html
- 2025-10-09T18:22:01.589Z
+ 2025-10-10T13:00:34.416Zhttps://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html
- 2025-10-09T18:22:01.980Z
+ 2025-10-10T13:00:34.802Zhttps://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html
- 2025-10-09T18:22:02.537Z
+ 2025-10-10T13:00:35.354Zhttps://docs.axolotl.ai/docs/api/core.trainers.trl.html
- 2025-10-09T18:22:01.390Z
+ 2025-10-10T13:00:34.220Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html
- 2025-10-09T18:22:01.646Z
+ 2025-10-10T13:00:34.472Zhttps://docs.axolotl.ai/docs/api/loaders.constants.html
- 2025-10-09T18:22:01.481Z
+ 2025-10-10T13:00:34.310Zhttps://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html
- 2025-10-09T18:22:02.005Z
+ 2025-10-10T13:00:34.827Zhttps://docs.axolotl.ai/docs/api/cli.vllm_serve.html
- 2025-10-09T18:22:01.292Z
+ 2025-10-10T13:00:34.123Zhttps://docs.axolotl.ai/docs/api/prompt_tokenizers.html
- 2025-10-09T18:22:01.003Z
+ 2025-10-10T13:00:33.835Zhttps://docs.axolotl.ai/docs/api/cli.args.html
- 2025-10-09T18:22:01.186Z
+ 2025-10-10T13:00:34.017Zhttps://docs.axolotl.ai/docs/api/cli.inference.html
- 2025-10-09T18:22:01.243Z
+ 2025-10-10T13:00:34.074Zhttps://docs.axolotl.ai/docs/api/cli.utils.load.html
- 2025-10-09T18:22:01.333Z
+ 2025-10-10T13:00:34.163Zhttps://docs.axolotl.ai/docs/api/cli.preprocess.html
- 2025-10-09T18:22:01.277Z
+ 2025-10-10T13:00:34.108Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html
- 2025-10-09T18:22:02.635Z
+ 2025-10-10T13:00:35.451Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html
- 2025-10-09T18:22:02.630Z
+ 2025-10-10T13:00:35.446Zhttps://docs.axolotl.ai/docs/api/core.chat.format.chatml.html
- 2025-10-09T18:22:01.079Z
+ 2025-10-10T13:00:33.911Zhttps://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html
- 2025-10-09T18:22:02.492Z
+ 2025-10-10T13:00:35.309Zhttps://docs.axolotl.ai/docs/api/integrations.kd.trainer.html
- 2025-10-09T18:22:02.502Z
+ 2025-10-10T13:00:35.319Zhttps://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html
- 2025-10-09T18:22:01.994Z
+ 2025-10-10T13:00:34.816Zhttps://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html
- 2025-10-09T18:22:01.081Z
+ 2025-10-10T13:00:33.913Zhttps://docs.axolotl.ai/docs/reward_modelling.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/quantize.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/fsdp_qlora.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/nd_parallelism.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/batch_vs_grad.html
- 2025-10-09T18:18:54.711Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/multi-node.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/rlhf.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/dataset-formats/pretraining.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/dataset-formats/tokenized.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/dataset-formats/template_free.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.818Zhttps://docs.axolotl.ai/docs/multi-gpu.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/input_output.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/docker.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/gradient_checkpointing.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/optimizations.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/sequence_parallelism.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.823Zhttps://docs.axolotl.ai/docs/dataset_loading.html
- 2025-10-09T18:18:54.712Z
+ 2025-10-10T12:57:10.819Zhttps://docs.axolotl.ai/docs/installation.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/mixed_precision.html
- 2025-10-09T18:18:54.715Z
+ 2025-10-10T12:57:10.822Zhttps://docs.axolotl.ai/docs/unsloth.html
- 2025-10-09T18:18:54.716Z
+ 2025-10-10T12:57:10.823Zhttps://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html
- 2025-10-09T18:18:54.737Z
+ 2025-10-10T12:57:10.844Z