diff --git a/.nojekyll b/.nojekyll index e1b1bdeec..783134bf9 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -4f6c81ae \ No newline at end of file +1b4fe270 \ No newline at end of file diff --git a/docs/multi-gpu.html b/docs/multi-gpu.html index 0f7eb5bf7..9a16a0e20 100644 --- a/docs/multi-gpu.html +++ b/docs/multi-gpu.html @@ -677,6 +677,10 @@ also follow the config field mapping below to update field names.

fsdp_use_orig_params REMOVED + +fsdp_activation_checkpointing +activation_checkpointing +

For more details, please see the migration guide in the torchtitan repo. In Axolotl, @@ -1321,98 +1325,99 @@ single sequence causes OOM errors during model training.

fsdp_cpu_ram_efficient_loading | cpu_ram_efficient_loading fsdp_state_dict_type | state_dict_type fsdp_use_orig_params | **REMOVED** - -For more details, please see the migration guide in the [torchtitan repo](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md). In Axolotl, -if you were using the following FSDP1 config: - -```{.yaml} -fsdp_version: 1 -fsdp_config: - fsdp_offload_params: false - fsdp_cpu_ram_efficient_loading: true - fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP - fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer - fsdp_state_dict_type: FULL_STATE_DICT - fsdp_sharding_strategy: FULL_SHARD -``` - -You can migrate to the following FSDP2 config: - -```{.yaml} -fsdp_version: 2 -fsdp_config: - offload_params: false - cpu_ram_efficient_loading: true - auto_wrap_policy: TRANSFORMER_BASED_WRAP - transformer_layer_cls_to_wrap: Qwen3DecoderLayer - state_dict_type: FULL_STATE_DICT - reshard_after_forward: true -``` - -### FSDP1 (deprecated) {#sec-fsdp-config} - -::: {.callout-note} - -Using `fsdp` to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use `fsdp_config` as above instead. - -::: - -```{.yaml} -fsdp: - - full_shard - - auto_wrap -fsdp_config: - fsdp_offload_params: true - fsdp_state_dict_type: FULL_STATE_DICT - fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer -``` - +fsdp_activation_checkpointing | activation_checkpointing + +For more details, please see the migration guide in the [torchtitan repo](https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md). In Axolotl, +if you were using the following FSDP1 config: + +```{.yaml} +fsdp_version: 1 +fsdp_config: + fsdp_offload_params: false + fsdp_cpu_ram_efficient_loading: true + fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP + fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer + fsdp_state_dict_type: FULL_STATE_DICT + fsdp_sharding_strategy: FULL_SHARD +``` + +You can migrate to the following FSDP2 config: + +```{.yaml} +fsdp_version: 2 +fsdp_config: + offload_params: false + cpu_ram_efficient_loading: true + auto_wrap_policy: TRANSFORMER_BASED_WRAP + transformer_layer_cls_to_wrap: Qwen3DecoderLayer + state_dict_type: FULL_STATE_DICT + reshard_after_forward: true +``` + +### FSDP1 (deprecated) {#sec-fsdp-config} + +::: {.callout-note} + +Using `fsdp` to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use `fsdp_config` as above instead. + +::: + +```{.yaml} +fsdp: + - full_shard + - auto_wrap +fsdp_config: + fsdp_offload_params: true + fsdp_state_dict_type: FULL_STATE_DICT + fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer +``` -## Sequence parallelism {#sec-sequence-parallelism} - -We support sequence parallelism (SP) via the -[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This -allows one to split up sequences across GPUs, which is useful in the event that a -single sequence causes OOM errors during model training. - -See our [dedicated guide](sequence_parallelism.qmd) for more information. - -### FSDP + QLoRA {#sec-fsdp-qlora} - -For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). - -## Performance Optimization {#sec-performance} - -### Liger Kernel Integration {#sec-liger} - -Please see [docs](custom_integrations.qmd#liger) for more info. - -## Troubleshooting {#sec-troubleshooting} - -### NCCL Issues {#sec-nccl} - -For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd). - -### Common Problems {#sec-common-problems} - -::: {.panel-tabset} - -## Memory Issues - -- Reduce `micro_batch_size` -- Reduce `eval_batch_size` -- Adjust `gradient_accumulation_steps` -- Consider using a higher ZeRO stage - -## Training Instability - -- Start with DeepSpeed ZeRO-2 -- Monitor loss values -- Check learning rates - -::: - -For more detailed troubleshooting, see our [debugging guide](debugging.qmd). + +## Sequence parallelism {#sec-sequence-parallelism} + +We support sequence parallelism (SP) via the +[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This +allows one to split up sequences across GPUs, which is useful in the event that a +single sequence causes OOM errors during model training. + +See our [dedicated guide](sequence_parallelism.qmd) for more information. + +### FSDP + QLoRA {#sec-fsdp-qlora} + +For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). + +## Performance Optimization {#sec-performance} + +### Liger Kernel Integration {#sec-liger} + +Please see [docs](custom_integrations.qmd#liger) for more info. + +## Troubleshooting {#sec-troubleshooting} + +### NCCL Issues {#sec-nccl} + +For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd). + +### Common Problems {#sec-common-problems} + +::: {.panel-tabset} + +## Memory Issues + +- Reduce `micro_batch_size` +- Reduce `eval_batch_size` +- Adjust `gradient_accumulation_steps` +- Consider using a higher ZeRO stage + +## Training Instability + +- Start with DeepSpeed ZeRO-2 +- Monitor loss values +- Check learning rates + +::: + +For more detailed troubleshooting, see our [debugging guide](debugging.qmd). diff --git a/search.json b/search.json index 002f8999c..529717fca 100644 --- a/search.json +++ b/search.json @@ -3757,7 +3757,7 @@ "href": "docs/multi-gpu.html#sec-fsdp", "title": "Multi-GPU", "section": "3 Fully Sharded Data Parallel (FSDP)", - "text": "3 Fully Sharded Data Parallel (FSDP)\n\n\n\n\n\n\nNote\n\n\n\nFSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.\n\n\n\n3.1 Migrating from FSDP1 to FSDP2\nTo migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and\nalso follow the config field mapping below to update field names.\n\n3.1.1 Config mapping\n\n\n\nFSDP1\nFSDP2\n\n\n\n\nfsdp_sharding_strategy\nreshard_after_forward\n\n\nfsdp_backward_prefetch_policy\nREMOVED\n\n\nfsdp_backward_prefetch\nREMOVED\n\n\nfsdp_forward_prefetch\nREMOVED\n\n\nfsdp_sync_module_states\nREMOVED\n\n\nfsdp_cpu_ram_efficient_loading\ncpu_ram_efficient_loading\n\n\nfsdp_state_dict_type\nstate_dict_type\n\n\nfsdp_use_orig_params\nREMOVED\n\n\n\nFor more details, please see the migration guide in the torchtitan repo. In Axolotl,\nif you were using the following FSDP1 config:\nfsdp_version: 1\nfsdp_config:\n fsdp_offload_params: false\n fsdp_cpu_ram_efficient_loading: true\n fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP\n fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_sharding_strategy: FULL_SHARD\nYou can migrate to the following FSDP2 config:\nfsdp_version: 2\nfsdp_config:\n offload_params: false\n cpu_ram_efficient_loading: true\n auto_wrap_policy: TRANSFORMER_BASED_WRAP\n transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n state_dict_type: FULL_STATE_DICT\n reshard_after_forward: true\n\n\n\n3.2 FSDP1 (deprecated)\n\n\n\n\n\n\nNote\n\n\n\nUsing fsdp to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use fsdp_config as above instead.\n\n\nfsdp:\n - full_shard\n - auto_wrap\nfsdp_config:\n fsdp_offload_params: true\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer", + "text": "3 Fully Sharded Data Parallel (FSDP)\n\n\n\n\n\n\nNote\n\n\n\nFSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.\n\n\n\n3.1 Migrating from FSDP1 to FSDP2\nTo migrate your config from FSDP1 to FSDP2, you must use the fsdp_version top-level config field to specify the FSDP version, and\nalso follow the config field mapping below to update field names.\n\n3.1.1 Config mapping\n\n\n\nFSDP1\nFSDP2\n\n\n\n\nfsdp_sharding_strategy\nreshard_after_forward\n\n\nfsdp_backward_prefetch_policy\nREMOVED\n\n\nfsdp_backward_prefetch\nREMOVED\n\n\nfsdp_forward_prefetch\nREMOVED\n\n\nfsdp_sync_module_states\nREMOVED\n\n\nfsdp_cpu_ram_efficient_loading\ncpu_ram_efficient_loading\n\n\nfsdp_state_dict_type\nstate_dict_type\n\n\nfsdp_use_orig_params\nREMOVED\n\n\nfsdp_activation_checkpointing\nactivation_checkpointing\n\n\n\nFor more details, please see the migration guide in the torchtitan repo. In Axolotl,\nif you were using the following FSDP1 config:\nfsdp_version: 1\nfsdp_config:\n fsdp_offload_params: false\n fsdp_cpu_ram_efficient_loading: true\n fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP\n fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_sharding_strategy: FULL_SHARD\nYou can migrate to the following FSDP2 config:\nfsdp_version: 2\nfsdp_config:\n offload_params: false\n cpu_ram_efficient_loading: true\n auto_wrap_policy: TRANSFORMER_BASED_WRAP\n transformer_layer_cls_to_wrap: Qwen3DecoderLayer\n state_dict_type: FULL_STATE_DICT\n reshard_after_forward: true\n\n\n\n3.2 FSDP1 (deprecated)\n\n\n\n\n\n\nNote\n\n\n\nUsing fsdp to configure FSDP is deprecated and will be removed in an upcoming release of Axolotl. Please use fsdp_config as above instead.\n\n\nfsdp:\n - full_shard\n - auto_wrap\nfsdp_config:\n fsdp_offload_params: true\n fsdp_state_dict_type: FULL_STATE_DICT\n fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer", "crumbs": [ "Deployments", "Multi-GPU" diff --git a/sitemap.xml b/sitemap.xml index d416dd5fa..f5521aff7 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,798 +2,798 @@ https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-10-09T18:18:54.738Z + 2025-10-10T12:57:10.844Z https://docs.axolotl.ai/docs/mac.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/cli.html - 2025-10-09T18:18:54.711Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/nccl.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/getting-started.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/qat.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/multipack.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/streaming.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.823Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-10-09T18:18:54.711Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/debugging.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-10-09T18:18:54.711Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/config-reference.html - 2025-10-09T18:22:17.616Z + 2025-10-10T13:00:50.711Z https://docs.axolotl.ai/docs/multimodal.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/faq.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/torchao.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.823Z https://docs.axolotl.ai/docs/optimizers.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-10-09T18:22:02.135Z + 2025-10-10T13:00:34.956Z https://docs.axolotl.ai/docs/api/cli.utils.sweeps.html - 2025-10-09T18:22:01.340Z + 2025-10-10T13:00:34.170Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-10-09T18:22:00.935Z + 2025-10-10T13:00:33.767Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-10-09T18:22:02.052Z + 2025-10-10T13:00:34.873Z https://docs.axolotl.ai/docs/api/loaders.tokenizer.html - 2025-10-09T18:22:01.458Z + 2025-10-10T13:00:34.287Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-10-09T18:22:01.925Z + 2025-10-10T13:00:34.747Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html - 2025-10-09T18:22:02.011Z + 2025-10-10T13:00:34.833Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-10-09T18:22:02.186Z + 2025-10-10T13:00:35.007Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-10-09T18:22:01.992Z + 2025-10-10T13:00:34.814Z https://docs.axolotl.ai/docs/api/loaders.patch_manager.html - 2025-10-09T18:22:01.479Z + 2025-10-10T13:00:34.308Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-10-09T18:22:02.506Z + 2025-10-10T13:00:35.323Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-10-09T18:22:02.275Z + 2025-10-10T13:00:35.095Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-10-09T18:22:01.673Z + 2025-10-10T13:00:34.499Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-10-09T18:22:01.591Z + 2025-10-10T13:00:34.418Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-10-09T18:22:01.296Z + 2025-10-10T13:00:34.127Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html - 2025-10-09T18:22:02.043Z + 2025-10-10T13:00:34.865Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-10-09T18:22:01.895Z + 2025-10-10T13:00:34.718Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-10-09T18:22:02.491Z + 2025-10-10T13:00:35.308Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-10-09T18:22:01.733Z + 2025-10-10T13:00:34.559Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-10-09T18:22:01.969Z + 2025-10-10T13:00:34.792Z https://docs.axolotl.ai/docs/api/core.builders.rl.html - 2025-10-09T18:22:01.033Z + 2025-10-10T13:00:33.865Z https://docs.axolotl.ai/docs/api/loaders.processor.html - 2025-10-09T18:22:01.460Z + 2025-10-10T13:00:34.289Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-10-09T18:22:02.637Z + 2025-10-10T13:00:35.452Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-10-09T18:22:01.049Z + 2025-10-10T13:00:33.881Z https://docs.axolotl.ai/docs/api/loaders.adapter.html - 2025-10-09T18:22:01.467Z + 2025-10-10T13:00:34.296Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-10-09T18:22:01.267Z + 2025-10-10T13:00:34.098Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-10-09T18:22:01.151Z + 2025-10-10T13:00:33.982Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2025-10-09T18:22:01.492Z + 2025-10-10T13:00:34.321Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-10-09T18:22:01.638Z + 2025-10-10T13:00:34.465Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-10-09T18:22:01.651Z + 2025-10-10T13:00:34.478Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-10-09T18:22:01.960Z + 2025-10-10T13:00:34.782Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-10-09T18:22:01.678Z + 2025-10-10T13:00:34.504Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-10-09T18:22:01.615Z + 2025-10-10T13:00:34.442Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-10-09T18:22:01.078Z + 2025-10-10T13:00:33.909Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2025-10-09T18:22:01.500Z + 2025-10-10T13:00:34.329Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-10-09T18:22:01.709Z + 2025-10-10T13:00:34.535Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-10-09T18:22:01.721Z + 2025-10-10T13:00:34.547Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-10-09T18:22:02.301Z + 2025-10-10T13:00:35.121Z https://docs.axolotl.ai/docs/api/convert.html - 2025-10-09T18:22:00.952Z + 2025-10-10T13:00:33.784Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-10-09T18:22:01.711Z + 2025-10-10T13:00:34.537Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-10-09T18:22:02.225Z + 2025-10-10T13:00:35.046Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-10-09T18:22:02.311Z + 2025-10-10T13:00:35.132Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-10-09T18:22:01.971Z + 2025-10-10T13:00:34.793Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-10-09T18:22:01.680Z + 2025-10-10T13:00:34.506Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-10-09T18:22:01.419Z + 2025-10-10T13:00:34.248Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-10-09T18:22:02.510Z + 2025-10-10T13:00:35.327Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-10-09T18:22:02.539Z + 2025-10-10T13:00:35.356Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-10-09T18:22:01.083Z + 2025-10-10T13:00:33.915Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-10-09T18:22:01.759Z + 2025-10-10T13:00:34.584Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-10-09T18:22:02.622Z + 2025-10-10T13:00:35.438Z https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html - 2025-10-09T18:22:02.654Z + 2025-10-10T13:00:35.469Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-10-09T18:22:01.572Z + 2025-10-10T13:00:34.400Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-10-09T18:22:02.286Z + 2025-10-10T13:00:35.106Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-10-09T18:22:02.646Z + 2025-10-10T13:00:35.461Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-10-09T18:22:01.531Z + 2025-10-10T13:00:34.360Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-10-09T18:22:01.906Z + 2025-10-10T13:00:34.729Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-10-09T18:22:01.253Z + 2025-10-10T13:00:34.084Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-10-09T18:22:01.306Z + 2025-10-10T13:00:34.136Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2025-10-09T18:22:01.529Z + 2025-10-10T13:00:34.358Z https://docs.axolotl.ai/docs/api/index.html - 2025-10-09T18:22:00.837Z + 2025-10-10T13:00:33.670Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-10-09T18:22:01.693Z + 2025-10-10T13:00:34.519Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-10-09T18:22:02.007Z + 2025-10-10T13:00:34.829Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-10-09T18:22:01.664Z + 2025-10-10T13:00:34.491Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2025-10-09T18:22:01.433Z + 2025-10-10T13:00:34.263Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-10-09T18:22:02.060Z + 2025-10-10T13:00:34.881Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2025-10-09T18:22:01.488Z + 2025-10-10T13:00:34.317Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-10-09T18:22:01.219Z + 2025-10-10T13:00:34.051Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-10-09T18:22:01.919Z + 2025-10-10T13:00:34.741Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-10-09T18:22:02.563Z + 2025-10-10T13:00:35.379Z https://docs.axolotl.ai/docs/api/utils.quantization.html - 2025-10-09T18:22:02.210Z + 2025-10-10T13:00:35.030Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-10-09T18:22:02.167Z + 2025-10-10T13:00:34.988Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-10-09T18:22:01.904Z + 2025-10-10T13:00:34.727Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-10-09T18:22:02.242Z + 2025-10-10T13:00:35.063Z https://docs.axolotl.ai/docs/api/train.html - 2025-10-09T18:22:00.915Z + 2025-10-10T13:00:33.747Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-10-09T18:22:01.099Z + 2025-10-10T13:00:33.930Z https://docs.axolotl.ai/docs/inference.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/FAQS.html - 2025-10-09T18:18:54.710Z + 2025-10-10T12:57:10.816Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-10-09T18:18:54.720Z + 2025-10-10T12:57:10.827Z https://docs.axolotl.ai/index.html - 2025-10-09T18:18:54.733Z + 2025-10-10T12:57:10.839Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-10-09T18:18:54.711Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-10-09T18:22:02.318Z + 2025-10-10T13:00:35.138Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-10-09T18:22:01.883Z + 2025-10-10T13:00:34.706Z https://docs.axolotl.ai/docs/api/core.builders.causal.html - 2025-10-09T18:22:01.028Z + 2025-10-10T13:00:33.860Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2025-10-09T18:22:01.397Z + 2025-10-10T13:00:34.227Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-10-09T18:22:01.764Z + 2025-10-10T13:00:34.589Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-10-09T18:22:01.089Z + 2025-10-10T13:00:33.921Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-10-09T18:22:02.573Z + 2025-10-10T13:00:35.389Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-10-09T18:22:01.631Z + 2025-10-10T13:00:34.458Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-10-09T18:22:02.518Z + 2025-10-10T13:00:35.334Z https://docs.axolotl.ai/docs/api/cli.quantize.html - 2025-10-09T18:22:01.283Z + 2025-10-10T13:00:34.114Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-10-09T18:22:02.102Z + 2025-10-10T13:00:34.923Z https://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html - 2025-10-09T18:22:01.225Z + 2025-10-10T13:00:34.057Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-10-09T18:22:00.928Z + 2025-10-10T13:00:33.760Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-10-09T18:22:01.917Z + 2025-10-10T13:00:34.739Z https://docs.axolotl.ai/docs/api/loaders.model.html - 2025-10-09T18:22:01.448Z + 2025-10-10T13:00:34.277Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-10-09T18:22:02.160Z + 2025-10-10T13:00:34.981Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-10-09T18:22:02.067Z + 2025-10-10T13:00:34.888Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-10-09T18:22:01.870Z + 2025-10-10T13:00:34.693Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-10-09T18:22:01.141Z + 2025-10-10T13:00:33.972Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-10-09T18:22:02.514Z + 2025-10-10T13:00:35.331Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-10-09T18:22:02.176Z + 2025-10-10T13:00:34.997Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-10-09T18:22:01.304Z + 2025-10-10T13:00:34.134Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-10-09T18:22:01.913Z + 2025-10-10T13:00:34.736Z https://docs.axolotl.ai/docs/api/core.builders.base.html - 2025-10-09T18:22:01.022Z + 2025-10-10T13:00:33.854Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-10-09T18:22:02.279Z + 2025-10-10T13:00:35.099Z https://docs.axolotl.ai/docs/api/cli.utils.args.html - 2025-10-09T18:22:01.320Z + 2025-10-10T13:00:34.150Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-10-09T18:22:01.372Z + 2025-10-10T13:00:34.202Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-10-09T18:22:01.973Z + 2025-10-10T13:00:34.795Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-10-09T18:22:01.915Z + 2025-10-10T13:00:34.738Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-10-09T18:22:02.234Z + 2025-10-10T13:00:35.054Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-10-09T18:22:01.732Z + 2025-10-10T13:00:34.557Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-10-09T18:22:02.641Z + 2025-10-10T13:00:35.457Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-10-09T18:22:02.536Z + 2025-10-10T13:00:35.353Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-10-09T18:22:02.264Z + 2025-10-10T13:00:35.085Z https://docs.axolotl.ai/docs/api/cli.utils.fetch.html - 2025-10-09T18:22:01.326Z + 2025-10-10T13:00:34.157Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-10-09T18:22:01.706Z + 2025-10-10T13:00:34.532Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-10-09T18:22:01.923Z + 2025-10-10T13:00:34.746Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-10-09T18:22:01.162Z + 2025-10-10T13:00:33.992Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-10-09T18:22:01.707Z + 2025-10-10T13:00:34.534Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2025-10-09T18:22:01.435Z + 2025-10-10T13:00:34.265Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-10-09T18:22:01.605Z + 2025-10-10T13:00:34.433Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-10-09T18:22:02.054Z + 2025-10-10T13:00:34.875Z https://docs.axolotl.ai/docs/api/utils.data.streaming.html - 2025-10-09T18:22:02.178Z + 2025-10-10T13:00:34.999Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-10-09T18:22:02.071Z + 2025-10-10T13:00:34.893Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-10-09T18:22:02.516Z + 2025-10-10T13:00:35.333Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-10-09T18:22:01.198Z + 2025-10-10T13:00:34.029Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-10-09T18:22:01.405Z + 2025-10-10T13:00:34.235Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-10-09T18:22:02.487Z + 2025-10-10T13:00:35.304Z https://docs.axolotl.ai/docs/api/cli.utils.train.html - 2025-10-09T18:22:01.355Z + 2025-10-10T13:00:34.185Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-10-09T18:22:02.567Z + 2025-10-10T13:00:35.383Z https://docs.axolotl.ai/docs/api/cli.art.html - 2025-10-09T18:22:01.190Z + 2025-10-10T13:00:34.021Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-10-09T18:22:01.984Z + 2025-10-10T13:00:34.806Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-10-09T18:22:01.015Z + 2025-10-10T13:00:33.846Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-10-09T18:22:02.081Z + 2025-10-10T13:00:34.902Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-10-09T18:22:01.660Z + 2025-10-10T13:00:34.486Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-10-09T18:22:01.589Z + 2025-10-10T13:00:34.416Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-10-09T18:22:01.980Z + 2025-10-10T13:00:34.802Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-10-09T18:22:02.537Z + 2025-10-10T13:00:35.354Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-10-09T18:22:01.390Z + 2025-10-10T13:00:34.220Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-10-09T18:22:01.646Z + 2025-10-10T13:00:34.472Z https://docs.axolotl.ai/docs/api/loaders.constants.html - 2025-10-09T18:22:01.481Z + 2025-10-10T13:00:34.310Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-10-09T18:22:02.005Z + 2025-10-10T13:00:34.827Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-10-09T18:22:01.292Z + 2025-10-10T13:00:34.123Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-10-09T18:22:01.003Z + 2025-10-10T13:00:33.835Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-10-09T18:22:01.186Z + 2025-10-10T13:00:34.017Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-10-09T18:22:01.243Z + 2025-10-10T13:00:34.074Z https://docs.axolotl.ai/docs/api/cli.utils.load.html - 2025-10-09T18:22:01.333Z + 2025-10-10T13:00:34.163Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-10-09T18:22:01.277Z + 2025-10-10T13:00:34.108Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-10-09T18:22:02.635Z + 2025-10-10T13:00:35.451Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-10-09T18:22:02.630Z + 2025-10-10T13:00:35.446Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-10-09T18:22:01.079Z + 2025-10-10T13:00:33.911Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-10-09T18:22:02.492Z + 2025-10-10T13:00:35.309Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-10-09T18:22:02.502Z + 2025-10-10T13:00:35.319Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-10-09T18:22:01.994Z + 2025-10-10T13:00:34.816Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-10-09T18:22:01.081Z + 2025-10-10T13:00:33.913Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/quantize.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/nd_parallelism.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-10-09T18:18:54.711Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/multi-node.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/rlhf.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.818Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/input_output.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/docker.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/gradient_checkpointing.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/optimizations.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.823Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-10-09T18:18:54.712Z + 2025-10-10T12:57:10.819Z https://docs.axolotl.ai/docs/installation.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/mixed_precision.html - 2025-10-09T18:18:54.715Z + 2025-10-10T12:57:10.822Z https://docs.axolotl.ai/docs/unsloth.html - 2025-10-09T18:18:54.716Z + 2025-10-10T12:57:10.823Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-10-09T18:18:54.737Z + 2025-10-10T12:57:10.844Z