diff --git a/.nojekyll b/.nojekyll index 2e3a335b7..6b71a7f0e 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -0496a1b7 \ No newline at end of file +d5d7dce8 \ No newline at end of file diff --git a/docs/api/utils.ctx_managers.sequence_parallel.html b/docs/api/utils.ctx_managers.sequence_parallel.html index 468cfc0f0..4a9b55b07 100644 --- a/docs/api/utils.ctx_managers.sequence_parallel.html +++ b/docs/api/utils.ctx_managers.sequence_parallel.html @@ -685,7 +685,8 @@ from the full gradient tensor.

gradient_accumulation_steps, ring_attn_func, heads_k_stride, -) + gather_outputs, +)

Context manager for sequence parallelism operations.

This class provides a context that will automatically apply sequence parallelism during model forward passes using a pre-forward hook, and gather outputs from @@ -738,6 +739,12 @@ across the sequence parallelism group using a post-forward hook.

Sequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation. required + +gather_outputs +bool +Whether to gather outputs after model forward pass across the sequence parallel group. +required + diff --git a/search.json b/search.json index 7909bf089..36689c9ed 100644 --- a/search.json +++ b/search.json @@ -1482,14 +1482,14 @@ "href": "docs/api/utils.ctx_managers.sequence_parallel.html", "title": "utils.ctx_managers.sequence_parallel", "section": "", - "text": "utils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\nName\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n heads_k_stride,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\nheads_k_stride\nint | None\nSequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation.\nrequired\n\n\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_sequence_parallelism\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.apply_sequence_parallelism(\n batch,\n local_rank,\n local_world_size,\n gradient_accumulation_steps,\n ring_attn_func,\n)\nApply sequence parallelism slicing to a batch.\nSpecial handling is implemented for integer logits_to_keep, which indicates\nto only keep the last N tokens in the sequence during generation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\ndict[str, torch.Tensor]\nBatch dictionary (e.g., input_ids, attention_mask, etc.).\nrequired\n\n\nlocal_rank\nint\nLocal rank in the sequence parallel group.\nrequired\n\n\nlocal_world_size\nint\nWorld size of the sequence parallel group.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused, but related to above TODO.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[dict[str, torch.Tensor], int, int]\ntuple of: - Batch dictionary with sliced tensors. - The original sequence length before padding. - The number of padding tokens added." + "text": "utils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\nName\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n heads_k_stride,\n gather_outputs,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\nheads_k_stride\nint | None\nSequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation.\nrequired\n\n\ngather_outputs\nbool\nWhether to gather outputs after model forward pass across the sequence parallel group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_sequence_parallelism\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.apply_sequence_parallelism(\n batch,\n local_rank,\n local_world_size,\n gradient_accumulation_steps,\n ring_attn_func,\n)\nApply sequence parallelism slicing to a batch.\nSpecial handling is implemented for integer logits_to_keep, which indicates\nto only keep the last N tokens in the sequence during generation.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\ndict[str, torch.Tensor]\nBatch dictionary (e.g., input_ids, attention_mask, etc.).\nrequired\n\n\nlocal_rank\nint\nLocal rank in the sequence parallel group.\nrequired\n\n\nlocal_world_size\nint\nWorld size of the sequence parallel group.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused, but related to above TODO.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[dict[str, torch.Tensor], int, int]\ntuple of: - Batch dictionary with sliced tensors. - The original sequence length before padding. - The number of padding tokens added." }, { "objectID": "docs/api/utils.ctx_managers.sequence_parallel.html#classes", "href": "docs/api/utils.ctx_managers.sequence_parallel.html#classes", "title": "utils.ctx_managers.sequence_parallel", "section": "", - "text": "Name\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n heads_k_stride,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\nheads_k_stride\nint | None\nSequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation.\nrequired" + "text": "Name\nDescription\n\n\n\n\nAllGatherWithGrad\nCustom autograd function for all-gather to preserve gradients.\n\n\nSequenceParallelContextManager\nContext manager for sequence parallelism operations.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad()\nCustom autograd function for all-gather to preserve gradients.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbackward\nBackward pass for all-gather operation.\n\n\nforward\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.backward(\n ctx,\n grad_output,\n)\nBackward pass for all-gather operation.\nExtracts the gradient slice corresponding to this rank’s original input\nfrom the full gradient tensor.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ngrad_output\ntorch.Tensor\nGradient from subsequent layers with respect to the concatenated output tensor.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[torch.Tensor, None]\nTuple containing the gradient slice for this rank’s input tensor and None for the process group parameter which doesn’t require gradients.\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.AllGatherWithGrad.forward(\n ctx,\n input_tensor,\n group,\n)\nForward pass of all-gather of data with sequence dimension.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nctx\ntorch.autograd.function.FunctionCtx\ntorch.autograd function context.\nrequired\n\n\ninput_tensor\ntorch.Tensor\nTensor from model output with sequence dimension.\nrequired\n\n\ngroup\ndist.ProcessGroup\ntorch.distributed process group.\nrequired\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nTensor from gathering the input_tensor from across the process group and concatenating along the sequence dimension.\n\n\n\n\n\n\n\n\n\nutils.ctx_managers.sequence_parallel.SequenceParallelContextManager(\n models,\n sequence_parallel_degree,\n gradient_accumulation_steps,\n ring_attn_func,\n heads_k_stride,\n gather_outputs,\n)\nContext manager for sequence parallelism operations.\nThis class provides a context that will automatically apply sequence parallelism\nduring model forward passes using a pre-forward hook, and gather outputs from\nacross the sequence parallelism group using a post-forward hook.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nmodels\nlist[nn.Module]\nList of models to apply sequence parallelism to pre- and post- forward hooks.\nrequired\n\n\nsequence_parallel_degree\nint\nNumber of processes to split sequences over.\nrequired\n\n\ngradient_accumulation_steps\nint\nNumber of steps to accumulate gradients over.\nrequired\n\n\nring_attn_func\nRingAttnFunc\nWhich ring attention function to use. Currently unused.\nrequired\n\n\nheads_k_stride\nint | None\nSequence parallelism K head stride size. Passed through to varlen_llama3 ring_flash_attn implementation.\nrequired\n\n\ngather_outputs\nbool\nWhether to gather outputs after model forward pass across the sequence parallel group.\nrequired" }, { "objectID": "docs/api/utils.ctx_managers.sequence_parallel.html#functions", diff --git a/sitemap.xml b/sitemap.xml index 1c87bb7dd..65114b4a0 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,758 +2,758 @@ https://docs.axolotl.ai/docs/unsloth.html - 2025-06-24T18:59:39.482Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-06-24T18:59:39.477Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/mac.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/nccl.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/multi-node.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/docker.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/inference.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.137Z https://docs.axolotl.ai/docs/cli.html - 2025-06-24T18:59:39.477Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/config-reference.html - 2025-06-24T19:02:59.339Z + 2025-06-25T12:37:22.831Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/debugging.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/multimodal.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/api/cli.sweeps.html - 2025-06-24T19:02:46.206Z + 2025-06-25T12:37:08.988Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-06-24T19:02:46.533Z + 2025-06-25T12:37:09.321Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-06-24T19:02:46.930Z + 2025-06-25T12:37:09.717Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-06-24T19:02:46.733Z + 2025-06-25T12:37:09.521Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-06-24T19:02:46.254Z + 2025-06-25T12:37:09.038Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-06-24T19:02:46.687Z + 2025-06-25T12:37:09.474Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2025-06-24T19:02:46.327Z + 2025-06-25T12:37:09.111Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-06-24T19:02:46.057Z + 2025-06-25T12:37:08.836Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-06-24T19:02:47.041Z + 2025-06-25T12:37:09.830Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-06-24T19:02:46.795Z + 2025-06-25T12:37:09.585Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-06-24T19:02:46.434Z + 2025-06-25T12:37:09.223Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-06-24T19:02:46.569Z + 2025-06-25T12:37:09.353Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-06-24T19:02:46.248Z + 2025-06-25T12:37:09.031Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-06-24T19:02:46.697Z + 2025-06-25T12:37:09.484Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-06-24T19:02:46.498Z + 2025-06-25T12:37:09.287Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-06-24T19:02:46.592Z + 2025-06-25T12:37:09.377Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-06-24T19:02:46.487Z + 2025-06-25T12:37:09.277Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-06-24T19:02:46.706Z + 2025-06-25T12:37:09.493Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-06-24T19:02:47.253Z + 2025-06-25T12:37:10.043Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-06-24T19:02:47.032Z + 2025-06-25T12:37:09.821Z https://docs.axolotl.ai/docs/api/core.builders.rl.html - 2025-06-24T19:02:46.012Z + 2025-06-25T12:37:08.791Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-06-24T19:02:45.916Z + 2025-06-25T12:37:08.696Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-06-24T19:02:46.705Z + 2025-06-25T12:37:09.492Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-06-24T19:02:46.731Z + 2025-06-25T12:37:09.520Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2025-06-24T19:02:46.370Z + 2025-06-25T12:37:09.156Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-06-24T19:02:47.213Z + 2025-06-25T12:37:10.003Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-06-24T19:02:46.180Z + 2025-06-25T12:37:08.961Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-06-24T19:02:46.192Z + 2025-06-25T12:37:08.973Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-06-24T19:02:46.812Z + 2025-06-25T12:37:09.602Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-06-24T19:02:46.481Z + 2025-06-25T12:37:09.271Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-06-24T19:02:47.282Z + 2025-06-25T12:37:10.072Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-06-24T19:02:46.969Z + 2025-06-25T12:37:09.756Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-06-24T19:02:46.449Z + 2025-06-25T12:37:09.238Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-06-24T19:02:47.228Z + 2025-06-25T12:37:10.018Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-06-24T19:02:46.747Z + 2025-06-25T12:37:09.536Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-06-24T19:02:46.245Z + 2025-06-25T12:37:09.028Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-06-24T19:02:46.793Z + 2025-06-25T12:37:09.583Z https://docs.axolotl.ai/docs/api/loaders.patch_manager.html - 2025-06-24T19:02:46.360Z + 2025-06-25T12:37:09.146Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-06-24T19:02:47.061Z + 2025-06-25T12:37:09.851Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-06-24T19:02:47.329Z + 2025-06-25T12:37:10.119Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-06-24T19:02:46.238Z + 2025-06-25T12:37:09.021Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-06-24T19:02:47.003Z + 2025-06-25T12:37:09.790Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-06-24T19:02:46.493Z + 2025-06-25T12:37:09.283Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-06-24T19:02:46.950Z + 2025-06-25T12:37:09.737Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html - 2025-06-24T19:02:46.848Z + 2025-06-25T12:37:09.636Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-06-24T19:02:46.805Z + 2025-06-25T12:37:09.595Z https://docs.axolotl.ai/docs/api/core.builders.base.html - 2025-06-24T19:02:45.999Z + 2025-06-25T12:37:08.778Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-06-24T19:02:46.286Z + 2025-06-25T12:37:09.069Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-06-24T19:02:46.113Z + 2025-06-25T12:37:08.893Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-06-24T19:02:46.961Z + 2025-06-25T12:37:09.748Z https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html - 2025-06-24T19:02:47.349Z + 2025-06-25T12:37:10.138Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-06-24T19:02:46.302Z + 2025-06-25T12:37:09.086Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-06-24T19:02:46.052Z + 2025-06-25T12:37:08.831Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-06-24T19:02:46.756Z + 2025-06-25T12:37:09.545Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-06-24T19:02:46.157Z + 2025-06-25T12:37:08.938Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-06-24T19:02:46.200Z + 2025-06-25T12:37:08.982Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-06-24T19:02:46.270Z + 2025-06-25T12:37:09.053Z https://docs.axolotl.ai/docs/api/convert.html - 2025-06-24T19:02:45.941Z + 2025-06-25T12:37:08.720Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-06-24T19:02:46.515Z + 2025-06-25T12:37:09.304Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-06-24T19:02:47.044Z + 2025-06-25T12:37:09.833Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-06-24T19:02:46.133Z + 2025-06-25T12:37:08.913Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-06-24T19:02:46.521Z + 2025-06-25T12:37:09.310Z https://docs.axolotl.ai/docs/api/loaders.constants.html - 2025-06-24T19:02:46.361Z + 2025-06-25T12:37:09.147Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-06-24T19:02:45.993Z + 2025-06-25T12:37:08.772Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-06-24T19:02:46.172Z + 2025-06-25T12:37:08.953Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2025-06-24T19:02:46.400Z + 2025-06-25T12:37:09.187Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-06-24T19:02:47.234Z + 2025-06-25T12:37:10.025Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-06-24T19:02:47.015Z + 2025-06-25T12:37:09.803Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-06-24T19:02:46.508Z + 2025-06-25T12:37:09.298Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-06-24T19:02:46.889Z + 2025-06-25T12:37:09.674Z https://docs.axolotl.ai/docs/api/loaders.tokenizer.html - 2025-06-24T19:02:46.345Z + 2025-06-25T12:37:09.130Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-06-24T19:02:46.881Z + 2025-06-25T12:37:09.667Z https://docs.axolotl.ai/docs/api/utils.quantization.html - 2025-06-24T19:02:46.990Z + 2025-06-25T12:37:09.777Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-06-24T18:59:39.477Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/input_output.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/index.html - 2025-06-24T18:59:39.494Z + 2025-06-25T12:34:07.151Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-06-24T18:59:39.498Z + 2025-06-25T12:34:07.155Z https://docs.axolotl.ai/FAQS.html - 2025-06-24T18:59:39.476Z + 2025-06-25T12:34:07.132Z https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-06-24T18:59:39.498Z + 2025-06-25T12:34:07.156Z https://docs.axolotl.ai/TODO.html - 2025-06-24T18:59:39.476Z + 2025-06-25T12:34:07.133Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-06-24T18:59:39.482Z + 2025-06-25T12:34:07.139Z https://docs.axolotl.ai/docs/torchao.html - 2025-06-24T18:59:39.482Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/quantize.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/qat.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-06-24T19:02:46.872Z + 2025-06-25T12:37:09.658Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-06-24T19:02:46.461Z + 2025-06-25T12:37:09.250Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-06-24T19:02:46.802Z + 2025-06-25T12:37:09.592Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-06-24T19:02:47.255Z + 2025-06-25T12:37:10.045Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-06-24T19:02:46.505Z + 2025-06-25T12:37:09.294Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-06-24T19:02:47.333Z + 2025-06-25T12:37:10.123Z https://docs.axolotl.ai/docs/api/utils.data.pretraining.html - 2025-06-24T19:02:46.962Z + 2025-06-25T12:37:09.749Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-06-24T19:02:47.334Z + 2025-06-25T12:37:10.124Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-06-24T19:02:46.906Z + 2025-06-25T12:37:09.692Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-06-24T19:02:47.216Z + 2025-06-25T12:37:10.007Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-06-24T19:02:47.010Z + 2025-06-25T12:37:09.798Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-06-24T19:02:46.815Z + 2025-06-25T12:37:09.605Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-06-24T19:02:46.545Z + 2025-06-25T12:37:09.332Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-06-24T19:02:45.927Z + 2025-06-25T12:37:08.707Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-06-24T19:02:47.072Z + 2025-06-25T12:37:09.861Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-06-24T19:02:47.224Z + 2025-06-25T12:37:10.015Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-06-24T19:02:46.785Z + 2025-06-25T12:37:09.575Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-06-24T19:02:47.274Z + 2025-06-25T12:37:10.064Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2025-06-24T19:02:46.325Z + 2025-06-25T12:37:09.110Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-06-24T19:02:46.401Z + 2025-06-25T12:37:09.189Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-06-24T19:02:46.749Z + 2025-06-25T12:37:09.538Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-06-24T19:02:46.589Z + 2025-06-25T12:37:09.374Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-06-24T19:02:46.548Z + 2025-06-25T12:37:09.335Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-06-24T19:02:46.049Z + 2025-06-25T12:37:08.828Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2025-06-24T19:02:46.377Z + 2025-06-25T12:37:09.163Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-06-24T19:02:46.877Z + 2025-06-25T12:37:09.663Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-06-24T19:02:46.567Z + 2025-06-25T12:37:09.352Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-06-24T19:02:46.855Z + 2025-06-25T12:37:09.642Z https://docs.axolotl.ai/docs/api/loaders.model.html - 2025-06-24T19:02:46.336Z + 2025-06-25T12:37:09.121Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-06-24T19:02:47.338Z + 2025-06-25T12:37:10.128Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-06-24T19:02:46.313Z + 2025-06-25T12:37:09.097Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-06-24T19:02:46.097Z + 2025-06-25T12:37:08.876Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-06-24T19:02:47.342Z + 2025-06-25T12:37:10.131Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-06-24T19:02:46.865Z + 2025-06-25T12:37:09.652Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-06-24T19:02:47.077Z + 2025-06-25T12:37:09.867Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-06-24T19:02:47.236Z + 2025-06-25T12:37:10.026Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-06-24T19:02:46.757Z + 2025-06-25T12:37:09.546Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-06-24T19:02:46.447Z + 2025-06-25T12:37:09.236Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-06-24T19:02:47.323Z + 2025-06-25T12:37:10.113Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-06-24T19:02:47.217Z + 2025-06-25T12:37:10.008Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-06-24T19:02:46.543Z + 2025-06-25T12:37:09.331Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-06-24T19:02:46.816Z + 2025-06-25T12:37:09.606Z https://docs.axolotl.ai/docs/api/train.html - 2025-06-24T19:02:45.906Z + 2025-06-25T12:37:08.685Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-06-24T19:02:46.796Z + 2025-06-25T12:37:09.586Z https://docs.axolotl.ai/docs/api/index.html - 2025-06-24T19:02:45.844Z + 2025-06-25T12:37:08.622Z https://docs.axolotl.ai/docs/api/loaders.adapter.html - 2025-06-24T19:02:46.352Z + 2025-06-25T12:37:09.137Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-06-24T19:02:47.049Z + 2025-06-25T12:37:09.838Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-06-24T19:02:46.676Z + 2025-06-25T12:37:09.463Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-06-24T19:02:46.558Z + 2025-06-25T12:37:09.343Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-06-24T19:02:46.139Z + 2025-06-25T12:37:08.920Z https://docs.axolotl.ai/docs/api/cli.quantize.html - 2025-06-24T19:02:46.259Z + 2025-06-25T12:37:09.043Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-06-24T19:02:47.231Z + 2025-06-25T12:37:10.021Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-06-24T19:02:46.047Z + 2025-06-25T12:37:08.826Z https://docs.axolotl.ai/docs/api/core.builders.causal.html - 2025-06-24T19:02:46.004Z + 2025-06-25T12:37:08.783Z https://docs.axolotl.ai/docs/api/core.trainers.relora.html - 2025-06-24T19:02:46.296Z + 2025-06-25T12:37:09.079Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-06-24T19:02:47.254Z + 2025-06-25T12:37:10.044Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html - 2025-06-24T19:02:46.820Z + 2025-06-25T12:37:09.610Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2025-06-24T19:02:46.291Z + 2025-06-25T12:37:09.074Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-06-24T19:02:46.065Z + 2025-06-25T12:37:08.844Z https://docs.axolotl.ai/docs/api/loaders.processor.html - 2025-06-24T19:02:46.346Z + 2025-06-25T12:37:09.131Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-06-24T19:02:46.050Z + 2025-06-25T12:37:08.829Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-06-24T19:02:46.519Z + 2025-06-25T12:37:09.309Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-06-24T19:02:46.105Z + 2025-06-25T12:37:08.885Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2025-06-24T19:02:46.367Z + 2025-06-25T12:37:09.153Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-06-24T19:02:47.277Z + 2025-06-25T12:37:10.067Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-06-24T19:02:46.813Z + 2025-06-25T12:37:09.603Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-06-24T19:02:46.953Z + 2025-06-25T12:37:09.740Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-06-24T19:02:46.469Z + 2025-06-25T12:37:09.258Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-06-24T19:02:46.024Z + 2025-06-25T12:37:08.803Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-06-24T19:02:46.546Z + 2025-06-25T12:37:09.334Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-06-24T19:02:45.984Z + 2025-06-25T12:37:08.762Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-06-24T19:02:47.238Z + 2025-06-25T12:37:10.028Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-06-24T18:59:39.477Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/getting-started.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/faq.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/rlhf.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-06-24T18:59:39.477Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/installation.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/multipack.html - 2025-06-24T18:59:39.481Z + 2025-06-25T12:34:07.138Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.135Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-06-24T18:59:39.477Z + 2025-06-25T12:34:07.134Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-06-24T18:59:39.478Z + 2025-06-25T12:34:07.134Z