From 3e13ea033ee7e99f7b4d24ed7604da041058ff03 Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Wed, 9 Jul 2025 16:53:45 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- ...ch.gradient_checkpointing.offload_cpu.html | 12 + docs/multi-gpu.html | 130 +++--- search.json | 6 +- sitemap.xml | 378 +++++++++--------- 5 files changed, 282 insertions(+), 246 deletions(-) diff --git a/.nojekyll b/.nojekyll index fa762a488..c40bb00a7 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -610a007e \ No newline at end of file +ddc3e283 \ No newline at end of file diff --git a/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html b/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html index 56bee7618..705b36559 100644 --- a/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html +++ b/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html @@ -472,6 +472,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
  • Classes
  • @@ -501,6 +502,10 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); CPU_Offloaded_Gradient_Checkpointer Saves VRAM by smartly offloading to RAM. + +CheckpointFunctionWithCPUOffload +This is a torch/utils/checkpoint.py CheckpointFunction monkey patch that offloads the first tensor to cpu during forward and back to cuda during backward. This allows significant memory savings when using a very long seqlen. e.g. for llama 8b at 100k it’s 24GB saved per gpu: ((100_000*4096)*2*32/2**30) +
    @@ -509,6 +514,13 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); )

    Saves VRAM by smartly offloading to RAM. Tiny hit to performance, since we mask the movement via non blocking calls.

    +
    +
    +

    CheckpointFunctionWithCPUOffload

    +
    monkeypatch.gradient_checkpointing.offload_cpu.CheckpointFunctionWithCPUOffload(
    +)
    +

    This is a torch/utils/checkpoint.py CheckpointFunction monkey patch that offloads the first tensor to cpu during forward and back to cuda during backward. This allows significant memory savings when using a very long seqlen. e.g. for llama 8b at 100k it’s 24GB saved per gpu: ((100_000*4096)*2*32/2**30) +In the case of a very long seqlen 100k+ the copying to/from cpu overhead is not big, because dense quadratic attention compute will dominate.

    diff --git a/docs/multi-gpu.html b/docs/multi-gpu.html index d09071e3d..5d2dd52f8 100644 --- a/docs/multi-gpu.html +++ b/docs/multi-gpu.html @@ -572,6 +572,21 @@ Tip

    Start from Stage 1 -> Stage 2 -> Stage 3.

    +
    +
    +
    + +
    +
    +Tip +
    +
    +
    +

    Using ZeRO Stage 3 with Single-GPU training

    +

    ZeRO Stage 3 can be used for training on a single GPU by manually setting the environment variables: +WORLD_SIZE=1 LOCAL_RANK=0 MASTER_ADDR=0.0.0.0 MASTER_PORT=29500

    +
    +
    @@ -1161,65 +1176,74 @@ single sequence causes OOM errors during model training.

    ::: -## FSDP {#sec-fsdp} +::: {.callout-tip} -### Basic FSDP Configuration {#sec-fsdp-config} +Using ZeRO Stage 3 with Single-GPU training -```{.yaml} -fsdp: - - full_shard - - auto_wrap -fsdp_config: - fsdp_offload_params: true - fsdp_state_dict_type: FULL_STATE_DICT - fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer -``` - -## Sequence parallelism {#sec-sequence-parallelism} - -We support sequence parallelism (SP) via the -[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This -allows one to split up sequences across GPUs, which is useful in the event that a -single sequence causes OOM errors during model training. - -See our [dedicated guide](sequence_parallelism.qmd) for more information. +ZeRO Stage 3 can be used for training on a single GPU by manually setting the environment variables: +`WORLD_SIZE=1 LOCAL_RANK=0 MASTER_ADDR=0.0.0.0 MASTER_PORT=29500` + +::: + +## FSDP {#sec-fsdp} + +### Basic FSDP Configuration {#sec-fsdp-config} + +```{.yaml} +fsdp: + - full_shard + - auto_wrap +fsdp_config: + fsdp_offload_params: true + fsdp_state_dict_type: FULL_STATE_DICT + fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer +``` -### FSDP + QLoRA {#sec-fsdp-qlora} +## Sequence parallelism {#sec-sequence-parallelism} -For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). - -## Performance Optimization {#sec-performance} - -### Liger Kernel Integration {#sec-liger} - -Please see [docs](custom_integrations.qmd#liger) for more info. - -## Troubleshooting {#sec-troubleshooting} - -### NCCL Issues {#sec-nccl} - -For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd). - -### Common Problems {#sec-common-problems} - -::: {.panel-tabset} - -## Memory Issues - -- Reduce `micro_batch_size` -- Reduce `eval_batch_size` -- Adjust `gradient_accumulation_steps` -- Consider using a higher ZeRO stage +We support sequence parallelism (SP) via the +[ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention) project. This +allows one to split up sequences across GPUs, which is useful in the event that a +single sequence causes OOM errors during model training. + +See our [dedicated guide](sequence_parallelism.qmd) for more information. + +### FSDP + QLoRA {#sec-fsdp-qlora} + +For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). + +## Performance Optimization {#sec-performance} + +### Liger Kernel Integration {#sec-liger} + +Please see [docs](custom_integrations.qmd#liger) for more info. + +## Troubleshooting {#sec-troubleshooting} + +### NCCL Issues {#sec-nccl} + +For NCCL-related problems, see our [NCCL troubleshooting guide](nccl.qmd). + +### Common Problems {#sec-common-problems} -## Training Instability +::: {.panel-tabset} -- Start with DeepSpeed ZeRO-2 -- Monitor loss values -- Check learning rates - -::: - -For more detailed troubleshooting, see our [debugging guide](debugging.qmd). +## Memory Issues + +- Reduce `micro_batch_size` +- Reduce `eval_batch_size` +- Adjust `gradient_accumulation_steps` +- Consider using a higher ZeRO stage + +## Training Instability + +- Start with DeepSpeed ZeRO-2 +- Monitor loss values +- Check learning rates + +::: + +For more detailed troubleshooting, see our [debugging guide](debugging.qmd). diff --git a/search.json b/search.json index bc13ab0f4..d8fd33758 100644 --- a/search.json +++ b/search.json @@ -400,7 +400,7 @@ "href": "docs/multi-gpu.html#sec-deepspeed", "title": "Multi-GPU", "section": "2 DeepSpeed", - "text": "2 DeepSpeed\nDeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.\n\n2.1 Configuration\nAdd to your YAML config:\ndeepspeed: deepspeed_configs/zero1.json\n\n\n2.2 Usage\n# Fetch deepspeed configs (if not already present)\naxolotl fetch deepspeed_configs\n\n# Passing arg via config\naxolotl train config.yml\n\n# Passing arg via cli\naxolotl train config.yml --deepspeed deepspeed_configs/zero1.json\n\n\n2.3 ZeRO Stages\nWe provide default configurations for:\n\nZeRO Stage 1 (zero1.json)\nZeRO Stage 1 with torch compile (zero1_torch_compile.json)\nZeRO Stage 2 (zero2.json)\nZeRO Stage 3 (zero3.json)\nZeRO Stage 3 with bf16 (zero3_bf16.json)\nZeRO Stage 3 with bf16 and CPU offload params(zero3_bf16_cpuoffload_params.json)\nZeRO Stage 3 with bf16 and CPU offload params and optimizer (zero3_bf16_cpuoffload_all.json)\n\n\n\n\n\n\n\nTip\n\n\n\nChoose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.\nStart from Stage 1 -> Stage 2 -> Stage 3.", + "text": "2 DeepSpeed\nDeepSpeed is the recommended approach for multi-GPU training due to its stability and performance. It provides various optimization levels through ZeRO stages.\n\n2.1 Configuration\nAdd to your YAML config:\ndeepspeed: deepspeed_configs/zero1.json\n\n\n2.2 Usage\n# Fetch deepspeed configs (if not already present)\naxolotl fetch deepspeed_configs\n\n# Passing arg via config\naxolotl train config.yml\n\n# Passing arg via cli\naxolotl train config.yml --deepspeed deepspeed_configs/zero1.json\n\n\n2.3 ZeRO Stages\nWe provide default configurations for:\n\nZeRO Stage 1 (zero1.json)\nZeRO Stage 1 with torch compile (zero1_torch_compile.json)\nZeRO Stage 2 (zero2.json)\nZeRO Stage 3 (zero3.json)\nZeRO Stage 3 with bf16 (zero3_bf16.json)\nZeRO Stage 3 with bf16 and CPU offload params(zero3_bf16_cpuoffload_params.json)\nZeRO Stage 3 with bf16 and CPU offload params and optimizer (zero3_bf16_cpuoffload_all.json)\n\n\n\n\n\n\n\nTip\n\n\n\nChoose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.\nStart from Stage 1 -> Stage 2 -> Stage 3.\n\n\n\n\n\n\n\n\nTip\n\n\n\nUsing ZeRO Stage 3 with Single-GPU training\nZeRO Stage 3 can be used for training on a single GPU by manually setting the environment variables:\nWORLD_SIZE=1 LOCAL_RANK=0 MASTER_ADDR=0.0.0.0 MASTER_PORT=29500", "crumbs": [ "Deployments", "Multi-GPU" @@ -2791,14 +2791,14 @@ "href": "docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html", "title": "monkeypatch.gradient_checkpointing.offload_cpu", "section": "", - "text": "monkeypatch.gradient_checkpointing.offload_cpu\nCPU offloaded checkpointing\n\n\n\n\n\nName\nDescription\n\n\n\n\nCPU_Offloaded_Gradient_Checkpointer\nSaves VRAM by smartly offloading to RAM.\n\n\n\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu.CPU_Offloaded_Gradient_Checkpointer(\n)\nSaves VRAM by smartly offloading to RAM.\nTiny hit to performance, since we mask the movement via non blocking calls." + "text": "monkeypatch.gradient_checkpointing.offload_cpu\nCPU offloaded checkpointing\n\n\n\n\n\nName\nDescription\n\n\n\n\nCPU_Offloaded_Gradient_Checkpointer\nSaves VRAM by smartly offloading to RAM.\n\n\nCheckpointFunctionWithCPUOffload\nThis is a torch/utils/checkpoint.py CheckpointFunction monkey patch that offloads the first tensor to cpu during forward and back to cuda during backward. This allows significant memory savings when using a very long seqlen. e.g. for llama 8b at 100k it’s 24GB saved per gpu: ((100_000*4096)*2*32/2**30)\n\n\n\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu.CPU_Offloaded_Gradient_Checkpointer(\n)\nSaves VRAM by smartly offloading to RAM.\nTiny hit to performance, since we mask the movement via non blocking calls.\n\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu.CheckpointFunctionWithCPUOffload(\n)\nThis is a torch/utils/checkpoint.py CheckpointFunction monkey patch that offloads the first tensor to cpu during forward and back to cuda during backward. This allows significant memory savings when using a very long seqlen. e.g. for llama 8b at 100k it’s 24GB saved per gpu: ((100_000*4096)*2*32/2**30)\nIn the case of a very long seqlen 100k+ the copying to/from cpu overhead is not big, because dense quadratic attention compute will dominate." }, { "objectID": "docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html#classes", "href": "docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html#classes", "title": "monkeypatch.gradient_checkpointing.offload_cpu", "section": "", - "text": "Name\nDescription\n\n\n\n\nCPU_Offloaded_Gradient_Checkpointer\nSaves VRAM by smartly offloading to RAM.\n\n\n\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu.CPU_Offloaded_Gradient_Checkpointer(\n)\nSaves VRAM by smartly offloading to RAM.\nTiny hit to performance, since we mask the movement via non blocking calls." + "text": "Name\nDescription\n\n\n\n\nCPU_Offloaded_Gradient_Checkpointer\nSaves VRAM by smartly offloading to RAM.\n\n\nCheckpointFunctionWithCPUOffload\nThis is a torch/utils/checkpoint.py CheckpointFunction monkey patch that offloads the first tensor to cpu during forward and back to cuda during backward. This allows significant memory savings when using a very long seqlen. e.g. for llama 8b at 100k it’s 24GB saved per gpu: ((100_000*4096)*2*32/2**30)\n\n\n\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu.CPU_Offloaded_Gradient_Checkpointer(\n)\nSaves VRAM by smartly offloading to RAM.\nTiny hit to performance, since we mask the movement via non blocking calls.\n\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu.CheckpointFunctionWithCPUOffload(\n)\nThis is a torch/utils/checkpoint.py CheckpointFunction monkey patch that offloads the first tensor to cpu during forward and back to cuda during backward. This allows significant memory savings when using a very long seqlen. e.g. for llama 8b at 100k it’s 24GB saved per gpu: ((100_000*4096)*2*32/2**30)\nIn the case of a very long seqlen 100k+ the copying to/from cpu overhead is not big, because dense quadratic attention compute will dominate." }, { "objectID": "docs/api/core.trainers.mamba.html", diff --git a/sitemap.xml b/sitemap.xml index bfc9723d5..fa9c3c8d1 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,758 +2,758 @@ https://docs.axolotl.ai/docs/unsloth.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.713Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/mac.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/nccl.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/multi-node.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/docker.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.709Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/inference.html - 2025-07-09T13:22:46.805Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/cli.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/config-reference.html - 2025-07-09T13:26:17.755Z + 2025-07-09T16:51:58.872Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/debugging.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.709Z https://docs.axolotl.ai/docs/multimodal.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/api/cli.sweeps.html - 2025-07-09T13:26:03.594Z + 2025-07-09T16:51:44.406Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-07-09T13:26:03.923Z + 2025-07-09T16:51:44.735Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-07-09T13:26:04.310Z + 2025-07-09T16:51:45.123Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-07-09T13:26:04.123Z + 2025-07-09T16:51:44.935Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-07-09T13:26:03.643Z + 2025-07-09T16:51:44.454Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-07-09T13:26:04.076Z + 2025-07-09T16:51:44.888Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2025-07-09T13:26:03.716Z + 2025-07-09T16:51:44.530Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-07-09T13:26:03.432Z + 2025-07-09T16:51:44.256Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-07-09T13:26:04.424Z + 2025-07-09T16:51:45.237Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-07-09T13:26:04.187Z + 2025-07-09T16:51:44.997Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-07-09T13:26:03.825Z + 2025-07-09T16:51:44.638Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-07-09T13:26:03.956Z + 2025-07-09T16:51:44.768Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-07-09T13:26:03.636Z + 2025-07-09T16:51:44.448Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-07-09T13:26:04.086Z + 2025-07-09T16:51:44.898Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-07-09T13:26:03.890Z + 2025-07-09T16:51:44.702Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-07-09T13:26:03.980Z + 2025-07-09T16:51:44.792Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-07-09T13:26:03.879Z + 2025-07-09T16:51:44.692Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-07-09T13:26:04.095Z + 2025-07-09T16:51:44.907Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-07-09T13:26:04.635Z + 2025-07-09T16:51:45.452Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-07-09T13:26:04.415Z + 2025-07-09T16:51:45.228Z https://docs.axolotl.ai/docs/api/core.builders.rl.html - 2025-07-09T13:26:03.386Z + 2025-07-09T16:51:44.211Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-07-09T13:26:03.291Z + 2025-07-09T16:51:44.116Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-07-09T13:26:04.094Z + 2025-07-09T16:51:44.906Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-07-09T13:26:04.122Z + 2025-07-09T16:51:44.933Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2025-07-09T13:26:03.760Z + 2025-07-09T16:51:44.574Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-07-09T13:26:04.595Z + 2025-07-09T16:51:45.412Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-07-09T13:26:03.557Z + 2025-07-09T16:51:44.380Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-07-09T13:26:03.569Z + 2025-07-09T16:51:44.392Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-07-09T13:26:04.204Z + 2025-07-09T16:51:45.014Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-07-09T13:26:03.873Z + 2025-07-09T16:51:44.686Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-07-09T13:26:04.664Z + 2025-07-09T16:51:45.482Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-07-09T13:26:04.349Z + 2025-07-09T16:51:45.163Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-07-09T13:26:03.840Z + 2025-07-09T16:51:44.653Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-07-09T13:26:04.610Z + 2025-07-09T16:51:45.428Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-07-09T13:26:04.138Z + 2025-07-09T16:51:44.949Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-07-09T13:26:03.633Z + 2025-07-09T16:51:44.445Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-07-09T13:26:04.185Z + 2025-07-09T16:51:44.996Z https://docs.axolotl.ai/docs/api/loaders.patch_manager.html - 2025-07-09T13:26:03.750Z + 2025-07-09T16:51:44.563Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-07-09T13:26:04.444Z + 2025-07-09T16:51:45.258Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-07-09T13:26:04.712Z + 2025-07-09T16:51:45.528Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-07-09T13:26:03.626Z + 2025-07-09T16:51:44.438Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-07-09T13:26:04.385Z + 2025-07-09T16:51:45.198Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-07-09T13:26:03.885Z + 2025-07-09T16:51:44.697Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-07-09T13:26:04.330Z + 2025-07-09T16:51:45.143Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html - 2025-07-09T13:26:04.238Z + 2025-07-09T16:51:45.050Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-07-09T13:26:04.198Z + 2025-07-09T16:51:45.008Z https://docs.axolotl.ai/docs/api/core.builders.base.html - 2025-07-09T13:26:03.374Z + 2025-07-09T16:51:44.198Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-07-09T13:26:03.675Z + 2025-07-09T16:51:44.486Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-07-09T13:26:03.489Z + 2025-07-09T16:51:44.312Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-07-09T13:26:04.341Z + 2025-07-09T16:51:45.155Z https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html - 2025-07-09T13:26:04.731Z + 2025-07-09T16:51:45.547Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-07-09T13:26:03.692Z + 2025-07-09T16:51:44.502Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-07-09T13:26:03.427Z + 2025-07-09T16:51:44.251Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-07-09T13:26:04.147Z + 2025-07-09T16:51:44.957Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-07-09T13:26:03.533Z + 2025-07-09T16:51:44.356Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-07-09T13:26:03.585Z + 2025-07-09T16:51:44.400Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-07-09T13:26:03.658Z + 2025-07-09T16:51:44.470Z https://docs.axolotl.ai/docs/api/convert.html - 2025-07-09T13:26:03.315Z + 2025-07-09T16:51:44.141Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-07-09T13:26:03.907Z + 2025-07-09T16:51:44.719Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-07-09T13:26:04.427Z + 2025-07-09T16:51:45.241Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-07-09T13:26:03.509Z + 2025-07-09T16:51:44.331Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-07-09T13:26:03.913Z + 2025-07-09T16:51:44.725Z https://docs.axolotl.ai/docs/api/loaders.constants.html - 2025-07-09T13:26:03.751Z + 2025-07-09T16:51:44.565Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-07-09T13:26:03.367Z + 2025-07-09T16:51:44.192Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-07-09T13:26:03.548Z + 2025-07-09T16:51:44.371Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2025-07-09T13:26:03.790Z + 2025-07-09T16:51:44.604Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-07-09T13:26:04.617Z + 2025-07-09T16:51:45.434Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-07-09T13:26:04.397Z + 2025-07-09T16:51:45.210Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-07-09T13:26:03.900Z + 2025-07-09T16:51:44.713Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-07-09T13:26:04.268Z + 2025-07-09T16:51:45.081Z https://docs.axolotl.ai/docs/api/loaders.tokenizer.html - 2025-07-09T13:26:03.735Z + 2025-07-09T16:51:44.548Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-07-09T13:26:04.260Z + 2025-07-09T16:51:45.073Z https://docs.axolotl.ai/docs/api/utils.quantization.html - 2025-07-09T13:26:04.371Z + 2025-07-09T16:51:45.184Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/input_output.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/index.html - 2025-07-09T13:22:46.821Z + 2025-07-09T16:48:34.727Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-07-09T13:22:46.825Z + 2025-07-09T16:48:34.731Z https://docs.axolotl.ai/FAQS.html - 2025-07-09T13:22:46.801Z + 2025-07-09T16:48:34.706Z https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-07-09T13:22:46.825Z + 2025-07-09T16:48:34.731Z https://docs.axolotl.ai/TODO.html - 2025-07-09T13:22:46.801Z + 2025-07-09T16:48:34.707Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-07-09T13:22:46.808Z + 2025-07-09T16:48:34.714Z https://docs.axolotl.ai/docs/torchao.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/quantize.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/qat.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-07-09T13:26:04.251Z + 2025-07-09T16:51:45.064Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-07-09T13:26:03.852Z + 2025-07-09T16:51:44.665Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-07-09T13:26:04.194Z + 2025-07-09T16:51:45.004Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-07-09T13:26:04.637Z + 2025-07-09T16:51:45.455Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-07-09T13:26:03.896Z + 2025-07-09T16:51:44.709Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-07-09T13:26:04.716Z + 2025-07-09T16:51:45.532Z https://docs.axolotl.ai/docs/api/utils.data.pretraining.html - 2025-07-09T13:26:04.343Z + 2025-07-09T16:51:45.156Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-07-09T13:26:04.717Z + 2025-07-09T16:51:45.533Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-07-09T13:26:04.286Z + 2025-07-09T16:51:45.098Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-07-09T13:26:04.598Z + 2025-07-09T16:51:45.416Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-07-09T13:26:04.392Z + 2025-07-09T16:51:45.205Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-07-09T13:26:04.207Z + 2025-07-09T16:51:45.017Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-07-09T13:26:03.935Z + 2025-07-09T16:51:44.747Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-07-09T13:26:03.301Z + 2025-07-09T16:51:44.127Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-07-09T13:26:04.455Z + 2025-07-09T16:51:45.269Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-07-09T13:26:04.607Z + 2025-07-09T16:51:45.424Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-07-09T13:26:04.177Z + 2025-07-09T16:51:44.988Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-07-09T13:26:04.656Z + 2025-07-09T16:51:45.473Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2025-07-09T13:26:03.715Z + 2025-07-09T16:51:44.529Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-07-09T13:26:03.792Z + 2025-07-09T16:51:44.606Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-07-09T13:26:04.140Z + 2025-07-09T16:51:44.951Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-07-09T13:26:03.976Z + 2025-07-09T16:51:44.788Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-07-09T13:26:03.938Z + 2025-07-09T16:51:44.750Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-07-09T13:26:03.424Z + 2025-07-09T16:51:44.248Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2025-07-09T13:26:03.767Z + 2025-07-09T16:51:44.580Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-07-09T13:26:04.257Z + 2025-07-09T16:51:45.069Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-07-09T13:26:03.954Z + 2025-07-09T16:51:44.766Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-07-09T13:26:04.245Z + 2025-07-09T16:51:45.057Z https://docs.axolotl.ai/docs/api/loaders.model.html - 2025-07-09T13:26:03.726Z + 2025-07-09T16:51:44.540Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-07-09T13:26:04.721Z + 2025-07-09T16:51:45.537Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-07-09T13:26:03.702Z + 2025-07-09T16:51:44.517Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-07-09T13:26:03.472Z + 2025-07-09T16:51:44.295Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-07-09T13:26:04.724Z + 2025-07-09T16:51:45.540Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-07-09T13:26:04.246Z + 2025-07-09T16:51:45.059Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-07-09T13:26:04.460Z + 2025-07-09T16:51:45.275Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-07-09T13:26:04.618Z + 2025-07-09T16:51:45.436Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-07-09T13:26:04.148Z + 2025-07-09T16:51:44.959Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-07-09T13:26:03.839Z + 2025-07-09T16:51:44.652Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-07-09T13:26:04.706Z + 2025-07-09T16:51:45.522Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-07-09T13:26:04.599Z + 2025-07-09T16:51:45.417Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-07-09T13:26:03.933Z + 2025-07-09T16:51:44.745Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-07-09T13:26:04.209Z + 2025-07-09T16:51:45.019Z https://docs.axolotl.ai/docs/api/train.html - 2025-07-09T13:26:03.280Z + 2025-07-09T16:51:44.106Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-07-09T13:26:04.188Z + 2025-07-09T16:51:44.999Z https://docs.axolotl.ai/docs/api/index.html - 2025-07-09T13:26:03.218Z + 2025-07-09T16:51:44.043Z https://docs.axolotl.ai/docs/api/loaders.adapter.html - 2025-07-09T13:26:03.742Z + 2025-07-09T16:51:44.555Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-07-09T13:26:04.432Z + 2025-07-09T16:51:45.246Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-07-09T13:26:04.065Z + 2025-07-09T16:51:44.877Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-07-09T13:26:03.946Z + 2025-07-09T16:51:44.758Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-07-09T13:26:03.515Z + 2025-07-09T16:51:44.338Z https://docs.axolotl.ai/docs/api/cli.quantize.html - 2025-07-09T13:26:03.648Z + 2025-07-09T16:51:44.459Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-07-09T13:26:04.613Z + 2025-07-09T16:51:45.431Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-07-09T13:26:03.422Z + 2025-07-09T16:51:44.246Z https://docs.axolotl.ai/docs/api/core.builders.causal.html - 2025-07-09T13:26:03.378Z + 2025-07-09T16:51:44.203Z https://docs.axolotl.ai/docs/api/core.trainers.relora.html - 2025-07-09T13:26:03.685Z + 2025-07-09T16:51:44.496Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-07-09T13:26:04.636Z + 2025-07-09T16:51:45.453Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html - 2025-07-09T13:26:04.212Z + 2025-07-09T16:51:45.024Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2025-07-09T13:26:03.680Z + 2025-07-09T16:51:44.491Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-07-09T13:26:03.440Z + 2025-07-09T16:51:44.264Z https://docs.axolotl.ai/docs/api/loaders.processor.html - 2025-07-09T13:26:03.736Z + 2025-07-09T16:51:44.550Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-07-09T13:26:03.425Z + 2025-07-09T16:51:44.249Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-07-09T13:26:03.911Z + 2025-07-09T16:51:44.723Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-07-09T13:26:03.481Z + 2025-07-09T16:51:44.303Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2025-07-09T13:26:03.757Z + 2025-07-09T16:51:44.571Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-07-09T13:26:04.660Z + 2025-07-09T16:51:45.477Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-07-09T13:26:04.206Z + 2025-07-09T16:51:45.016Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-07-09T13:26:04.333Z + 2025-07-09T16:51:45.147Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-07-09T13:26:03.861Z + 2025-07-09T16:51:44.673Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-07-09T13:26:03.399Z + 2025-07-09T16:51:44.223Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-07-09T13:26:03.936Z + 2025-07-09T16:51:44.748Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-07-09T13:26:03.358Z + 2025-07-09T16:51:44.182Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-07-09T13:26:04.620Z + 2025-07-09T16:51:45.437Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.709Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/getting-started.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.709Z https://docs.axolotl.ai/docs/faq.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.709Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/rlhf.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/installation.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/multipack.html - 2025-07-09T13:22:46.806Z + 2025-07-09T16:48:34.712Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-07-09T13:22:46.803Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-07-09T13:22:46.802Z + 2025-07-09T16:48:34.708Z