From b7e6d945e98c61f4c1f05634098aeca7cb8e38bb Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Thu, 22 May 2025 15:20:58 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- .../utils.ctx_managers.sequence_parallel.html | 51 +-- search.json | 4 +- sitemap.xml | 358 +++++++++--------- 4 files changed, 199 insertions(+), 216 deletions(-) diff --git a/.nojekyll b/.nojekyll index f4df579b9..560449870 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -f1264b57 \ No newline at end of file +96781d78 \ No newline at end of file diff --git a/docs/api/utils.ctx_managers.sequence_parallel.html b/docs/api/utils.ctx_managers.sequence_parallel.html index 5fb1c57d3..882fe9955 100644 --- a/docs/api/utils.ctx_managers.sequence_parallel.html +++ b/docs/api/utils.ctx_managers.sequence_parallel.html @@ -664,7 +664,8 @@ from the full gradient tensor.

sequence_parallel_degree, gradient_accumulation_steps, ring_attn_func, -) + heads_k_stride, +)

Context manager for sequence parallelism operations.

This class provides a context that will automatically apply sequence parallelism during model forward passes using a pre-forward hook, and gather outputs from @@ -673,10 +674,10 @@ across the sequence parallelism group using a post-forward hook.

Parameters

----++++ @@ -711,32 +712,14 @@ across the sequence parallelism group using a post-forward hook.

- -
Which ring attention function to use. Currently unused. required
- -
-

Methods

- - - - - - - - - - + + + +
NameDescription
gather_outputsGather sharded outputs from all ranks and reconstruct the full tensor.heads_k_strideint | NoneSequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation.required
-
-
gather_outputs
-
utils.ctx_managers.sequence_parallel.SequenceParallelContextManager.gather_outputs(
-    output,
-)
-

Gather sharded outputs from all ranks and reconstruct the full tensor.

-
@@ -758,13 +741,13 @@ across the sequence parallelism group using a post-forward hook.

apply_sequence_parallelism

-
utils.ctx_managers.sequence_parallel.apply_sequence_parallelism(
-    batch,
-    local_rank,
-    local_world_size,
-    gradient_accumulation_steps,
-    ring_attn_func,
-)
+
utils.ctx_managers.sequence_parallel.apply_sequence_parallelism(
+    batch,
+    local_rank,
+    local_world_size,
+    gradient_accumulation_steps,
+    ring_attn_func,
+)

Apply sequence parallelism slicing to a batch.

Special handling is implemented for integer logits_to_keep, which indicates to only keep the last N tokens in the sequence during generation.

diff --git a/search.json b/search.json index bf0dc8907..c47c79241 100644 --- a/search.json +++ b/search.json @@ -2239,14 +2239,14 @@ "href": "docs/api/utils.ctx_managers.sequence_parallel.html", "title": "utils.ctx_managers.sequence_parallel", "section": "", - "text": "utils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\nName\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n self,\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ngather_outputs\nGather sharded outputs from all ranks and reconstruct the full tensor.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager.gather_outputs(\n output,\n)\nGather sharded outputs from all ranks and reconstruct the full tensor.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_sequence_parallelism\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.apply_sequence_parallelism(\n batch,\n local_rank,\n local_world_size,\n gradient_accumulation_steps,\n ring_attn_func,\n)\nApply sequence parallelism slicing to a batch.\nSpecial handling is implemented for integer logits_to_keep, which indicates\nto only keep the last N tokens in the sequence during generation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\ndict[str, torch.Tensor]\nBatch dictionary (e.g., input_ids, attention_mask, etc.).\nrequired\n\n\nlocal_rank\nint\nLocal rank in the sequence parallel group.\nrequired\n\n\nlocal_world_size\nint\nWorld size of the sequence parallel group.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused, but related to above TODO.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[dict[str, torch.Tensor], int, int]\ntuple of: - Batch dictionary with sliced tensors. - The original sequence length before padding. - The number of padding tokens added." + "text": "utils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\nName\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n self,\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n heads_k_stride,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\nheads_k_stride\nint | None\nSequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation.\nrequired\n\n\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_sequence_parallelism\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.apply_sequence_parallelism(\n batch,\n local_rank,\n local_world_size,\n gradient_accumulation_steps,\n ring_attn_func,\n)\nApply sequence parallelism slicing to a batch.\nSpecial handling is implemented for integer logits_to_keep, which indicates\nto only keep the last N tokens in the sequence during generation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\ndict[str, torch.Tensor]\nBatch dictionary (e.g., input_ids, attention_mask, etc.).\nrequired\n\n\nlocal_rank\nint\nLocal rank in the sequence parallel group.\nrequired\n\n\nlocal_world_size\nint\nWorld size of the sequence parallel group.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused, but related to above TODO.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[dict[str, torch.Tensor], int, int]\ntuple of: - Batch dictionary with sliced tensors. - The original sequence length before padding. - The number of padding tokens added." }, { "objectID": "docs/api/utils.ctx_managers.sequence_parallel.html#classes", "href": "docs/api/utils.ctx_managers.sequence_parallel.html#classes", "title": "utils.ctx_managers.sequence_parallel", "section": "", - "text": "Name\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n self,\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ngather_outputs\nGather sharded outputs from all ranks and reconstruct the full tensor.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager.gather_outputs(\n output,\n)\nGather sharded outputs from all ranks and reconstruct the full tensor." + "text": "Name\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n self,\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n heads_k_stride,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\nheads_k_stride\nint | None\nSequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation.\nrequired" }, { "objectID": "docs/api/utils.ctx_managers.sequence_parallel.html#functions", diff --git a/sitemap.xml b/sitemap.xml index 2a0734f7f..daeba5e92 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,718 +2,718 @@ https://docs.axolotl.ai/TODO.html - 2025-05-22T12:20:12.565Z + 2025-05-22T15:18:46.027Z https://docs.axolotl.ai/docs/debugging.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/rlhf.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.033Z https://docs.axolotl.ai/docs/input_output.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.030Z https://docs.axolotl.ai/docs/torchao.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.033Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-05-22T12:20:12.566Z + 2025-05-22T15:18:46.028Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-05-22T12:20:12.566Z + 2025-05-22T15:18:46.028Z https://docs.axolotl.ai/docs/docker.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/multi-node.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.033Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-05-22T12:20:45.256Z + 2025-05-22T15:19:22.261Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-05-22T12:20:45.284Z + 2025-05-22T15:19:22.289Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-05-22T12:20:45.224Z + 2025-05-22T15:19:22.230Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-05-22T12:20:44.592Z + 2025-05-22T15:19:21.601Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-05-22T12:20:44.484Z + 2025-05-22T15:19:21.495Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-05-22T12:20:44.012Z + 2025-05-22T15:19:21.031Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-05-22T12:20:44.851Z + 2025-05-22T15:19:21.859Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-05-22T12:20:45.231Z + 2025-05-22T15:19:22.236Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-05-22T12:20:44.911Z + 2025-05-22T15:19:21.918Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-05-22T12:20:45.108Z + 2025-05-22T15:19:22.113Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-05-22T12:20:44.374Z + 2025-05-22T15:19:21.387Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-05-22T12:20:45.275Z + 2025-05-22T15:19:22.280Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-05-22T12:20:44.498Z + 2025-05-22T15:19:21.508Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-05-22T12:20:45.212Z + 2025-05-22T15:19:22.218Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-05-22T12:20:44.143Z + 2025-05-22T15:19:21.156Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-05-22T12:20:44.707Z + 2025-05-22T15:19:21.716Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-05-22T12:20:45.331Z + 2025-05-22T15:19:22.337Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-05-22T12:20:45.047Z + 2025-05-22T15:19:22.051Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-05-22T12:20:44.583Z + 2025-05-22T15:19:21.592Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-05-22T12:20:45.324Z + 2025-05-22T15:19:22.331Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-05-22T12:20:45.279Z + 2025-05-22T15:19:22.284Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-05-22T12:20:44.282Z + 2025-05-22T15:19:21.296Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-05-22T12:20:45.335Z + 2025-05-22T15:19:22.341Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-05-22T12:20:44.536Z + 2025-05-22T15:19:21.545Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-05-22T12:20:44.833Z + 2025-05-22T15:19:21.841Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-05-22T12:20:45.215Z + 2025-05-22T15:19:22.221Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-05-22T12:20:44.896Z + 2025-05-22T15:19:21.904Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-05-22T12:20:44.823Z + 2025-05-22T15:19:21.831Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-05-22T12:20:44.902Z + 2025-05-22T15:19:21.909Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-05-22T12:20:44.470Z + 2025-05-22T15:19:21.481Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-05-22T12:20:45.082Z + 2025-05-22T15:19:22.086Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-05-22T12:20:44.821Z + 2025-05-22T15:19:21.830Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-05-22T12:20:44.602Z + 2025-05-22T15:19:21.611Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2025-05-22T12:20:44.419Z + 2025-05-22T15:19:21.432Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-05-22T12:20:45.227Z + 2025-05-22T15:19:22.233Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-05-22T12:20:44.997Z + 2025-05-22T15:19:22.003Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-05-22T12:20:44.922Z + 2025-05-22T15:19:21.929Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-05-22T12:20:44.290Z + 2025-05-22T15:19:21.304Z https://docs.axolotl.ai/docs/api/index.html - 2025-05-22T12:20:43.872Z + 2025-05-22T15:19:20.892Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-05-22T12:20:44.824Z + 2025-05-22T15:19:21.833Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-05-22T12:20:44.558Z + 2025-05-22T15:19:21.567Z https://docs.axolotl.ai/docs/api/utils.gradient_checkpointing.offload_cpu.html - 2025-05-22T12:20:45.000Z + 2025-05-22T15:19:22.006Z https://docs.axolotl.ai/docs/api/train.html - 2025-05-22T12:20:43.933Z + 2025-05-22T15:19:20.953Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-05-22T12:20:44.764Z + 2025-05-22T15:19:21.772Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-05-22T12:20:44.224Z + 2025-05-22T15:19:21.239Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-05-22T12:20:45.102Z + 2025-05-22T15:19:22.107Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-05-22T12:20:45.064Z + 2025-05-22T15:19:22.069Z https://docs.axolotl.ai/docs/api/convert.html - 2025-05-22T12:20:43.965Z + 2025-05-22T15:19:20.984Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-05-22T12:20:44.519Z + 2025-05-22T15:19:21.529Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-05-22T12:20:44.840Z + 2025-05-22T15:19:21.848Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-05-22T12:20:44.231Z + 2025-05-22T15:19:21.245Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-05-22T12:20:44.146Z + 2025-05-22T15:19:21.159Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-05-22T12:20:44.778Z + 2025-05-22T15:19:21.787Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-05-22T12:20:45.340Z + 2025-05-22T15:19:22.346Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-05-22T12:20:44.152Z + 2025-05-22T15:19:21.166Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2025-05-22T12:20:44.426Z + 2025-05-22T15:19:21.439Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-05-22T12:20:44.964Z + 2025-05-22T15:19:21.970Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-05-22T12:20:45.094Z + 2025-05-22T15:19:22.098Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-05-22T12:20:44.887Z + 2025-05-22T15:19:21.894Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-05-22T12:20:44.983Z + 2025-05-22T15:19:21.989Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-05-22T12:20:44.626Z + 2025-05-22T15:19:21.635Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-05-22T12:20:45.237Z + 2025-05-22T15:19:22.242Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2025-05-22T12:20:44.411Z + 2025-05-22T15:19:21.424Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-05-22T12:20:44.506Z + 2025-05-22T15:19:21.516Z https://docs.axolotl.ai/docs/api/utils.models.html - 2025-05-22T12:20:44.880Z + 2025-05-22T15:19:21.888Z https://docs.axolotl.ai/docs/api/utils.lora_embeddings.html - 2025-05-22T12:20:44.906Z + 2025-05-22T15:19:21.912Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-05-22T12:20:44.199Z + 2025-05-22T15:19:21.214Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-05-22T12:20:43.951Z + 2025-05-22T15:19:20.971Z https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-05-22T12:20:12.587Z + 2025-05-22T15:18:46.049Z https://docs.axolotl.ai/index.html - 2025-05-22T12:20:12.583Z + 2025-05-22T15:18:46.046Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-05-22T12:20:12.571Z + 2025-05-22T15:18:46.034Z https://docs.axolotl.ai/FAQS.html - 2025-05-22T12:20:12.565Z + 2025-05-22T15:18:46.027Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-05-22T12:20:12.587Z + 2025-05-22T15:18:46.049Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-05-22T12:20:44.328Z + 2025-05-22T15:19:21.342Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2025-05-22T12:20:44.416Z + 2025-05-22T15:19:21.429Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-05-22T12:20:44.622Z + 2025-05-22T15:19:21.631Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-05-22T12:20:44.762Z + 2025-05-22T15:19:21.771Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-05-22T12:20:44.525Z + 2025-05-22T15:19:21.535Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-05-22T12:20:43.944Z + 2025-05-22T15:19:20.963Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2025-05-22T12:20:44.450Z + 2025-05-22T15:19:21.462Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-05-22T12:20:44.849Z + 2025-05-22T15:19:21.857Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2025-05-22T12:20:44.413Z + 2025-05-22T15:19:21.426Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-05-22T12:20:44.735Z + 2025-05-22T15:19:21.744Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-05-22T12:20:44.987Z + 2025-05-22T15:19:21.992Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-05-22T12:20:44.939Z + 2025-05-22T15:19:21.946Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-05-22T12:20:44.271Z + 2025-05-22T15:19:21.285Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-05-22T12:20:44.814Z + 2025-05-22T15:19:21.822Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-05-22T12:20:44.147Z + 2025-05-22T15:19:21.161Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-05-22T12:20:45.343Z + 2025-05-22T15:19:22.350Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-05-22T12:20:44.120Z + 2025-05-22T15:19:21.134Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-05-22T12:20:44.342Z + 2025-05-22T15:19:21.356Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-05-22T12:20:44.787Z + 2025-05-22T15:19:21.795Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-05-22T12:20:44.356Z + 2025-05-22T15:19:21.370Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-05-22T12:20:44.728Z + 2025-05-22T15:19:21.737Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-05-22T12:20:44.543Z + 2025-05-22T15:19:21.552Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-05-22T12:20:44.569Z + 2025-05-22T15:19:21.579Z https://docs.axolotl.ai/docs/api/utils.gradient_checkpointing.offload_disk.html - 2025-05-22T12:20:45.026Z + 2025-05-22T15:19:22.032Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-05-22T12:20:44.830Z + 2025-05-22T15:19:21.838Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-05-22T12:20:44.559Z + 2025-05-22T15:19:21.569Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-05-22T12:20:45.255Z + 2025-05-22T15:19:22.260Z https://docs.axolotl.ai/docs/api/core.trainer_builder.html - 2025-05-22T12:20:44.027Z + 2025-05-22T15:19:21.047Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-05-22T12:20:44.580Z + 2025-05-22T15:19:21.589Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-05-22T12:20:44.191Z + 2025-05-22T15:19:21.205Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-05-22T12:20:45.073Z + 2025-05-22T15:19:22.078Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-05-22T12:20:44.788Z + 2025-05-22T15:19:21.797Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-05-22T12:20:44.399Z + 2025-05-22T15:19:21.412Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-05-22T12:20:45.234Z + 2025-05-22T15:19:22.239Z https://docs.axolotl.ai/docs/api/core.trainers.relora.html - 2025-05-22T12:20:44.383Z + 2025-05-22T15:19:21.396Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-05-22T12:20:44.914Z + 2025-05-22T15:19:21.921Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-05-22T12:20:44.531Z + 2025-05-22T15:19:21.541Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-05-22T12:20:44.262Z + 2025-05-22T15:19:21.276Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-05-22T12:20:44.389Z + 2025-05-22T15:19:21.403Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-05-22T12:20:45.216Z + 2025-05-22T15:19:22.222Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2025-05-22T12:20:44.379Z + 2025-05-22T15:19:21.392Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-05-22T12:20:45.076Z + 2025-05-22T15:19:22.081Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-05-22T12:20:44.248Z + 2025-05-22T15:19:21.263Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-05-22T12:20:44.333Z + 2025-05-22T15:19:21.346Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-05-22T12:20:44.160Z + 2025-05-22T15:19:21.174Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-05-22T12:20:45.254Z + 2025-05-22T15:19:22.259Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-05-22T12:20:44.485Z + 2025-05-22T15:19:21.496Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-05-22T12:20:44.144Z + 2025-05-22T15:19:21.158Z https://docs.axolotl.ai/docs/api/monkeypatch.attention.mllama.html - 2025-05-22T12:20:44.848Z + 2025-05-22T15:19:21.856Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-05-22T12:20:44.581Z + 2025-05-22T15:19:21.590Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-05-22T12:20:45.035Z + 2025-05-22T15:19:22.040Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-05-22T12:20:44.841Z + 2025-05-22T15:19:21.849Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-05-22T12:20:44.584Z + 2025-05-22T15:19:21.593Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-05-22T12:20:44.546Z + 2025-05-22T15:19:21.556Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-05-22T12:20:44.207Z + 2025-05-22T15:19:21.222Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-05-22T12:20:44.780Z + 2025-05-22T15:19:21.788Z https://docs.axolotl.ai/docs/api/cli.sweeps.html - 2025-05-22T12:20:44.296Z + 2025-05-22T15:19:21.310Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-05-22T12:20:44.601Z + 2025-05-22T15:19:21.609Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-05-22T12:20:44.452Z + 2025-05-22T15:19:21.463Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-05-22T12:20:44.737Z + 2025-05-22T15:19:21.746Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-05-22T12:20:45.042Z + 2025-05-22T15:19:22.046Z https://docs.axolotl.ai/docs/api/utils.data.pretraining.html - 2025-05-22T12:20:44.995Z + 2025-05-22T15:19:22.001Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-05-22T12:20:45.336Z + 2025-05-22T15:19:22.342Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-05-22T12:20:44.994Z + 2025-05-22T15:19:22.000Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-05-22T12:20:45.235Z + 2025-05-22T15:19:22.241Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-05-22T12:20:44.336Z + 2025-05-22T15:19:21.350Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-05-22T12:20:44.553Z + 2025-05-22T15:19:21.563Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-05-22T12:20:44.007Z + 2025-05-22T15:19:21.026Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-05-22T12:20:44.718Z + 2025-05-22T15:19:21.727Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/multimodal.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/faq.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/multipack.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.033Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/nccl.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.033Z https://docs.axolotl.ai/docs/cli.html - 2025-05-22T12:20:12.566Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/config.html - 2025-05-22T12:20:12.566Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/unsloth.html - 2025-05-22T12:20:12.571Z + 2025-05-22T15:18:46.033Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.033Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.029Z https://docs.axolotl.ai/docs/installation.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/inference.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/mac.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.032Z https://docs.axolotl.ai/docs/getting-started.html - 2025-05-22T12:20:12.567Z + 2025-05-22T15:18:46.030Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-05-22T12:20:12.570Z + 2025-05-22T15:18:46.033Z