diff --git a/.nojekyll b/.nojekyll index 8223b6d0f..ab8a0397c 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -5e7d0afb \ No newline at end of file +3cfe7963 \ No newline at end of file diff --git a/docs/api/index.html b/docs/api/index.html index c9e2b804d..a1aed0f36 100644 --- a/docs/api/index.html +++ b/docs/api/index.html @@ -1004,7 +1004,7 @@ ul.task-list li input[type="checkbox"] { utils.collators.batching -Data collators for axolotl to pad labels and position_ids for packed sequences. Also +Data collators for axolotl to pad labels and position_ids for packed sequences utils.collators.mamba diff --git a/docs/api/utils.collators.batching.html b/docs/api/utils.collators.batching.html index 0b6e518e0..acfde0de6 100644 --- a/docs/api/utils.collators.batching.html +++ b/docs/api/utils.collators.batching.html @@ -466,8 +466,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin

utils.collators.batching

utils.collators.batching

-

Data collators for axolotl to pad labels and position_ids for packed sequences. Also -includes logic for handling sequence parallelism collation.

+

Data collators for axolotl to pad labels and position_ids for packed sequences

Classes

@@ -508,9 +507,7 @@ includes logic for handling sequence parallelism collation.

label_pad_token_id=-100, position_pad_token_id=0, return_tensors='pt', - sequence_parallel_degree=1, - ring_attn_func=None, -) +)

Collator for multipack specific to the using the BatchSampler

@@ -525,17 +522,15 @@ includes logic for handling sequence parallelism collation.

label_pad_token_id=-100, position_pad_token_id=0, return_tensors='pt', - sequence_parallel_degree=1, - ring_attn_func=None, -) +)

Data collator that will dynamically pad the inputs received, as well as the labels and position_ids

Parameters

-+-+ @@ -589,111 +584,33 @@ includes logic for handling sequence parallelism collation.

- - - - - -
The type of Tensor to return. Allowable values are “np”, “pt” and “tf”. 'pt'
sequence_parallel_degreeintThe degree of sequence parallelism. Default to 1 for no sequence parallelism.1
-
-

Methods

- - - - - - - - - - - - - -
NameDescription
apply_sequence_parallelismApply sequence parallelism slicing to a batch.
-
-
apply_sequence_parallelism
-
utils.collators.batching.DataCollatorForSeq2Seq.apply_sequence_parallelism(
-    batch,
-)
-

Apply sequence parallelism slicing to a batch.

-
-
Parameters
- ------ - - - - - - - - - - - - - - - - -
NameTypeDescriptionDefault
batchdict[str, torch.Tensor]Batch dictionary from parent collator.required
-
-
-
Returns
- - - - - - - - - - - - - - - -
NameTypeDescription
torch.TensorSliced batch dictionary.
-
-
-

PretrainingBatchSamplerDataCollatorForSeq2Seq

-
utils.collators.batching.PretrainingBatchSamplerDataCollatorForSeq2Seq(
-    self,
-    *args,
-    multipack_attn=True,
-    **kwargs,
-)
+
utils.collators.batching.PretrainingBatchSamplerDataCollatorForSeq2Seq(
+    self,
+    *args,
+    multipack_attn=True,
+    **kwargs,
+)

Collator for multipack specific to the using the BatchSampler

V2BatchSamplerDataCollatorForSeq2Seq

-
utils.collators.batching.V2BatchSamplerDataCollatorForSeq2Seq(
-    self,
-    tokenizer,
-    model=None,
-    padding=True,
-    max_length=None,
-    pad_to_multiple_of=None,
-    label_pad_token_id=-100,
-    position_pad_token_id=0,
-    return_tensors='pt',
-    sequence_parallel_degree=1,
-    ring_attn_func=None,
-)
+
utils.collators.batching.V2BatchSamplerDataCollatorForSeq2Seq(
+    self,
+    tokenizer,
+    model=None,
+    padding=True,
+    max_length=None,
+    pad_to_multiple_of=None,
+    label_pad_token_id=-100,
+    position_pad_token_id=0,
+    return_tensors='pt',
+)

Collator for multipack specific to the using the BatchSampler

diff --git a/search.json b/search.json index 3be5705ef..6be7782b5 100644 --- a/search.json +++ b/search.json @@ -659,7 +659,7 @@ "href": "docs/api/index.html", "title": "API Reference", "section": "", - "text": "Core functionality for training\n\n\n\ntrain\nPrepare and train a model on a dataset. Can also infer from a model or merge lora\n\n\nevaluate\nModule for evaluating models.\n\n\ndatasets\nModule containing Dataset functionality\n\n\nconvert\nModule containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes\n\n\nprompt_tokenizers\nModule containing PromptTokenizingStrategy and Prompter classes\n\n\nlogging_config\nCommon logging module for axolotl\n\n\ncore.trainer_builder\nBuilder for the training args and trainer\n\n\ncore.training_args\nextra axolotl specific training args\n\n\ncore.chat.messages\ninternal message representations of chat messages\n\n\ncore.chat.format.chatml\nChatML transformation functions for MessageContents\n\n\ncore.chat.format.llama3x\nLlama 3.x chat formatting functions for MessageContents\n\n\ncore.chat.format.shared\nshared functions for format transforms\n\n\ncore.datasets.chat\nchat dataset module\n\n\ncore.datasets.transforms.chat_builder\nThis module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.\n\n\n\n\n\n\nCommand-line interface\n\n\n\ncli.main\nClick CLI definitions for various axolotl commands.\n\n\ncli.train\nCLI to run training on a model.\n\n\ncli.evaluate\nCLI to run evaluation on a model.\n\n\ncli.args\nModule for axolotl CLI command arguments.\n\n\ncli.checks\nVarious checks for Axolotl CLI.\n\n\ncli.config\nConfiguration loading and processing.\n\n\ncli.inference\nCLI to run inference on a trained model.\n\n\ncli.merge_lora\nCLI to merge a trained LoRA into a base model.\n\n\ncli.merge_sharded_fsdp_weights\nCLI to merge sharded FSDP model checkpoints into a single combined checkpoint.\n\n\ncli.preprocess\nCLI to run preprocessing of a dataset.\n\n\ncli.sweeps\nUtilities for handling sweeps over configs for axolotl train CLI command\n\n\ncli.utils\nUtility methods for axolotl CLI.\n\n\ncli.vllm_serve\nCLI to start the vllm server for online RL\n\n\ncli.cloud.base\nbase class for cloud platforms from cli\n\n\ncli.cloud.modal_\nModal Cloud support from CLI\n\n\n\n\n\n\nTraining implementations\n\n\n\ncore.trainers.base\nModule for customized trainers\n\n\ncore.trainers.trl\nModule for TRL PPO trainer\n\n\ncore.trainers.dpo.trainer\nDPO trainer for axolotl\n\n\ncore.trainers.grpo.trainer\nAxolotl GRPO trainer\n\n\n\n\n\n\nPrompt formatting strategies\n\n\n\nprompt_strategies.base\nmodule for base dataset transform strategies\n\n\nprompt_strategies.chat_template\nHF Chat Templates prompt strategy\n\n\nprompt_strategies.alpaca_chat\nModule for Alpaca prompt strategy classes\n\n\nprompt_strategies.alpaca_instruct\nModule loading the AlpacaInstructPromptTokenizingStrategy class\n\n\nprompt_strategies.alpaca_w_system\nPrompt strategies loader for alpaca instruction datasets with system prompts\n\n\nprompt_strategies.user_defined\nUser Defined prompts with configuration from the YML config\n\n\nprompt_strategies.llama2_chat\nPrompt Strategy for finetuning Llama2 chat models\n\n\nprompt_strategies.completion\nBasic completion text\n\n\nprompt_strategies.input_output\nModule for plain input/output prompt pairs\n\n\nprompt_strategies.stepwise_supervised\nModule for stepwise datasets, typically including a prompt and reasoning traces,\n\n\nprompt_strategies.metharme\nModule containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class\n\n\nprompt_strategies.orcamini\nPrompt Strategy for finetuning Orca Mini (v2) models\n\n\nprompt_strategies.pygmalion\nModule containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class\n\n\nprompt_strategies.messages.chat\nChat dataset wrapping strategy for new internal messages representations\n\n\nprompt_strategies.dpo.chat_template\nDPO prompt strategies for using tokenizer chat templates.\n\n\nprompt_strategies.dpo.llama3\nDPO strategies for llama-3 chat template\n\n\nprompt_strategies.dpo.chatml\nDPO strategies for chatml\n\n\nprompt_strategies.dpo.zephyr\nDPO strategies for zephyr\n\n\nprompt_strategies.dpo.user_defined\nUser-defined DPO strategies\n\n\nprompt_strategies.dpo.passthrough\nDPO prompt strategies passthrough/zero-processing strategy\n\n\nprompt_strategies.kto.llama3\nKTO strategies for llama-3 chat template\n\n\nprompt_strategies.kto.chatml\nKTO strategies for chatml\n\n\nprompt_strategies.kto.user_defined\nUser-defined KTO strategies\n\n\nprompt_strategies.orpo.chat_template\nchatml prompt tokenization strategy for ORPO\n\n\nprompt_strategies.bradley_terry.llama3\nchatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template\n\n\n\n\n\n\nLow-level performance optimizations\n\n\n\nkernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\n\n\nkernels.geglu\nModule for definition of GEGLU Triton kernels.\n\n\nkernels.swiglu\nModule for definition of SwiGLU Triton kernels.\n\n\nkernels.quantize\nDequantization utilities for bitsandbytes integration.\n\n\nkernels.utils\nUtilities for axolotl.kernels submodules.\n\n\n\n\n\n\nRuntime patches for model optimizations\n\n\n\nmonkeypatch.llama_attn_hijack_flash\nFlash attention monkey patch for llama model\n\n\nmonkeypatch.llama_attn_hijack_xformers\nDirectly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments\n\n\nmonkeypatch.mistral_attn_hijack_flash\nFlash attention monkey patch for mistral model\n\n\nmonkeypatch.multipack\nmultipack patching for v2 of sample packing\n\n\nmonkeypatch.relora\nImplements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.\n\n\nmonkeypatch.llama_expand_mask\nexpands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf\n\n\nmonkeypatch.lora_kernels\nModule for patching custom LoRA Triton kernels and torch.autograd functions.\n\n\nmonkeypatch.utils\nShared utils for the monkeypatches\n\n\nmonkeypatch.btlm_attn_hijack_flash\nFlash attention monkey patch for cerebras btlm model\n\n\nmonkeypatch.llama_patch_multipack\nPatched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention\n\n\nmonkeypatch.stablelm_attn_hijack_flash\nPyTorch StableLM Epoch model.\n\n\nmonkeypatch.trainer_fsdp_optim\nfix for FSDP optimizer save in trainer w 4.47.0\n\n\nmonkeypatch.transformers_fa_utils\nsee https://github.com/huggingface/transformers/pull/35834\n\n\nmonkeypatch.unsloth_\nmodule for patching with unsloth optimizations\n\n\nmonkeypatch.attention.mllama\nMonkeypatch for Vision Llama for FA2 support\n\n\nmonkeypatch.data.batch_dataset_fetcher\nmonkey patches for the dataset fetcher to handle batches of packed indexes\n\n\nmonkeypatch.mixtral\nPatches to support multipack for mixtral\n\n\n\n\n\n\nUtility functions\n\n\n\nutils.models\nModule for models and model loading\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.lora_embeddings\nhelpers for lora embeddings\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nutility helpers for distributed checks\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\ndata handling specific to SFT\n\n\nutils.gradient_checkpointing.unsloth\nUnsloth checkpointing\n\n\n\n\n\n\nPydantic data models for Axolotl config\n\n\n\nutils.schemas.config\nModule with Pydantic models for configuration.\n\n\nutils.schemas.model\nPydantic models for model input / output, etc. configuration\n\n\nutils.schemas.training\nPydantic models for training hyperparameters\n\n\nutils.schemas.datasets\nPydantic models for datasets-related configuration\n\n\nutils.schemas.peft\nPydantic models for PEFT-related configuration\n\n\nutils.schemas.trl\nPydantic models for TRL trainer configuration\n\n\nutils.schemas.multimodal\nPydantic models for multimodal-related configuration\n\n\nutils.schemas.integrations\nPydantic models for Axolotl integrations\n\n\nutils.schemas.enums\nEnums for Axolotl input config\n\n\nutils.schemas.utils\nUtilities for Axolotl Pydantic models\n\n\n\n\n\n\nThird-party integrations and extensions\n\n\n\nintegrations.base\nBase class for all plugins.\n\n\nintegrations.cut_cross_entropy.args\nModule for handling Cut Cross Entropy input arguments.\n\n\nintegrations.grokfast.optimizer\n\n\n\nintegrations.kd.trainer\nKD trainer\n\n\nintegrations.liger.args\nModule for handling LIGER input arguments.\n\n\nintegrations.lm_eval.args\nModule for handling lm eval harness input arguments.\n\n\nintegrations.spectrum.args\nModule for handling Spectrum input arguments.\n\n\n\n\n\n\nCommon utilities and shared functionality\n\n\n\ncommon.architectures\nCommon architecture specific constants\n\n\ncommon.const\nVarious shared constants\n\n\ncommon.datasets\nDataset loading utilities.\n\n\n\n\n\n\nCustom model implementations\n\n\n\nmodels.mamba.modeling_mamba\n\n\n\n\n\n\n\nData processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences. Also\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler\n\n\n\n\n\n\nTraining callbacks\n\n\n\nutils.callbacks.perplexity\ncallback to calculate perplexity as an evaluation metric.\n\n\nutils.callbacks.profiler\nHF Trainer callback for creating pytorch profiling snapshots\n\n\nutils.callbacks.lisa\nmodule for LISA\n\n\nutils.callbacks.mlflow_\nMLFlow module for trainer callbacks\n\n\nutils.callbacks.comet_\nComet module for trainer callbacks" + "text": "Core functionality for training\n\n\n\ntrain\nPrepare and train a model on a dataset. Can also infer from a model or merge lora\n\n\nevaluate\nModule for evaluating models.\n\n\ndatasets\nModule containing Dataset functionality\n\n\nconvert\nModule containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes\n\n\nprompt_tokenizers\nModule containing PromptTokenizingStrategy and Prompter classes\n\n\nlogging_config\nCommon logging module for axolotl\n\n\ncore.trainer_builder\nBuilder for the training args and trainer\n\n\ncore.training_args\nextra axolotl specific training args\n\n\ncore.chat.messages\ninternal message representations of chat messages\n\n\ncore.chat.format.chatml\nChatML transformation functions for MessageContents\n\n\ncore.chat.format.llama3x\nLlama 3.x chat formatting functions for MessageContents\n\n\ncore.chat.format.shared\nshared functions for format transforms\n\n\ncore.datasets.chat\nchat dataset module\n\n\ncore.datasets.transforms.chat_builder\nThis module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.\n\n\n\n\n\n\nCommand-line interface\n\n\n\ncli.main\nClick CLI definitions for various axolotl commands.\n\n\ncli.train\nCLI to run training on a model.\n\n\ncli.evaluate\nCLI to run evaluation on a model.\n\n\ncli.args\nModule for axolotl CLI command arguments.\n\n\ncli.checks\nVarious checks for Axolotl CLI.\n\n\ncli.config\nConfiguration loading and processing.\n\n\ncli.inference\nCLI to run inference on a trained model.\n\n\ncli.merge_lora\nCLI to merge a trained LoRA into a base model.\n\n\ncli.merge_sharded_fsdp_weights\nCLI to merge sharded FSDP model checkpoints into a single combined checkpoint.\n\n\ncli.preprocess\nCLI to run preprocessing of a dataset.\n\n\ncli.sweeps\nUtilities for handling sweeps over configs for axolotl train CLI command\n\n\ncli.utils\nUtility methods for axolotl CLI.\n\n\ncli.vllm_serve\nCLI to start the vllm server for online RL\n\n\ncli.cloud.base\nbase class for cloud platforms from cli\n\n\ncli.cloud.modal_\nModal Cloud support from CLI\n\n\n\n\n\n\nTraining implementations\n\n\n\ncore.trainers.base\nModule for customized trainers\n\n\ncore.trainers.trl\nModule for TRL PPO trainer\n\n\ncore.trainers.dpo.trainer\nDPO trainer for axolotl\n\n\ncore.trainers.grpo.trainer\nAxolotl GRPO trainer\n\n\n\n\n\n\nPrompt formatting strategies\n\n\n\nprompt_strategies.base\nmodule for base dataset transform strategies\n\n\nprompt_strategies.chat_template\nHF Chat Templates prompt strategy\n\n\nprompt_strategies.alpaca_chat\nModule for Alpaca prompt strategy classes\n\n\nprompt_strategies.alpaca_instruct\nModule loading the AlpacaInstructPromptTokenizingStrategy class\n\n\nprompt_strategies.alpaca_w_system\nPrompt strategies loader for alpaca instruction datasets with system prompts\n\n\nprompt_strategies.user_defined\nUser Defined prompts with configuration from the YML config\n\n\nprompt_strategies.llama2_chat\nPrompt Strategy for finetuning Llama2 chat models\n\n\nprompt_strategies.completion\nBasic completion text\n\n\nprompt_strategies.input_output\nModule for plain input/output prompt pairs\n\n\nprompt_strategies.stepwise_supervised\nModule for stepwise datasets, typically including a prompt and reasoning traces,\n\n\nprompt_strategies.metharme\nModule containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class\n\n\nprompt_strategies.orcamini\nPrompt Strategy for finetuning Orca Mini (v2) models\n\n\nprompt_strategies.pygmalion\nModule containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class\n\n\nprompt_strategies.messages.chat\nChat dataset wrapping strategy for new internal messages representations\n\n\nprompt_strategies.dpo.chat_template\nDPO prompt strategies for using tokenizer chat templates.\n\n\nprompt_strategies.dpo.llama3\nDPO strategies for llama-3 chat template\n\n\nprompt_strategies.dpo.chatml\nDPO strategies for chatml\n\n\nprompt_strategies.dpo.zephyr\nDPO strategies for zephyr\n\n\nprompt_strategies.dpo.user_defined\nUser-defined DPO strategies\n\n\nprompt_strategies.dpo.passthrough\nDPO prompt strategies passthrough/zero-processing strategy\n\n\nprompt_strategies.kto.llama3\nKTO strategies for llama-3 chat template\n\n\nprompt_strategies.kto.chatml\nKTO strategies for chatml\n\n\nprompt_strategies.kto.user_defined\nUser-defined KTO strategies\n\n\nprompt_strategies.orpo.chat_template\nchatml prompt tokenization strategy for ORPO\n\n\nprompt_strategies.bradley_terry.llama3\nchatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template\n\n\n\n\n\n\nLow-level performance optimizations\n\n\n\nkernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\n\n\nkernels.geglu\nModule for definition of GEGLU Triton kernels.\n\n\nkernels.swiglu\nModule for definition of SwiGLU Triton kernels.\n\n\nkernels.quantize\nDequantization utilities for bitsandbytes integration.\n\n\nkernels.utils\nUtilities for axolotl.kernels submodules.\n\n\n\n\n\n\nRuntime patches for model optimizations\n\n\n\nmonkeypatch.llama_attn_hijack_flash\nFlash attention monkey patch for llama model\n\n\nmonkeypatch.llama_attn_hijack_xformers\nDirectly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments\n\n\nmonkeypatch.mistral_attn_hijack_flash\nFlash attention monkey patch for mistral model\n\n\nmonkeypatch.multipack\nmultipack patching for v2 of sample packing\n\n\nmonkeypatch.relora\nImplements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.\n\n\nmonkeypatch.llama_expand_mask\nexpands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf\n\n\nmonkeypatch.lora_kernels\nModule for patching custom LoRA Triton kernels and torch.autograd functions.\n\n\nmonkeypatch.utils\nShared utils for the monkeypatches\n\n\nmonkeypatch.btlm_attn_hijack_flash\nFlash attention monkey patch for cerebras btlm model\n\n\nmonkeypatch.llama_patch_multipack\nPatched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention\n\n\nmonkeypatch.stablelm_attn_hijack_flash\nPyTorch StableLM Epoch model.\n\n\nmonkeypatch.trainer_fsdp_optim\nfix for FSDP optimizer save in trainer w 4.47.0\n\n\nmonkeypatch.transformers_fa_utils\nsee https://github.com/huggingface/transformers/pull/35834\n\n\nmonkeypatch.unsloth_\nmodule for patching with unsloth optimizations\n\n\nmonkeypatch.attention.mllama\nMonkeypatch for Vision Llama for FA2 support\n\n\nmonkeypatch.data.batch_dataset_fetcher\nmonkey patches for the dataset fetcher to handle batches of packed indexes\n\n\nmonkeypatch.mixtral\nPatches to support multipack for mixtral\n\n\n\n\n\n\nUtility functions\n\n\n\nutils.models\nModule for models and model loading\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.lora_embeddings\nhelpers for lora embeddings\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nutility helpers for distributed checks\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\ndata handling specific to SFT\n\n\nutils.gradient_checkpointing.unsloth\nUnsloth checkpointing\n\n\n\n\n\n\nPydantic data models for Axolotl config\n\n\n\nutils.schemas.config\nModule with Pydantic models for configuration.\n\n\nutils.schemas.model\nPydantic models for model input / output, etc. configuration\n\n\nutils.schemas.training\nPydantic models for training hyperparameters\n\n\nutils.schemas.datasets\nPydantic models for datasets-related configuration\n\n\nutils.schemas.peft\nPydantic models for PEFT-related configuration\n\n\nutils.schemas.trl\nPydantic models for TRL trainer configuration\n\n\nutils.schemas.multimodal\nPydantic models for multimodal-related configuration\n\n\nutils.schemas.integrations\nPydantic models for Axolotl integrations\n\n\nutils.schemas.enums\nEnums for Axolotl input config\n\n\nutils.schemas.utils\nUtilities for Axolotl Pydantic models\n\n\n\n\n\n\nThird-party integrations and extensions\n\n\n\nintegrations.base\nBase class for all plugins.\n\n\nintegrations.cut_cross_entropy.args\nModule for handling Cut Cross Entropy input arguments.\n\n\nintegrations.grokfast.optimizer\n\n\n\nintegrations.kd.trainer\nKD trainer\n\n\nintegrations.liger.args\nModule for handling LIGER input arguments.\n\n\nintegrations.lm_eval.args\nModule for handling lm eval harness input arguments.\n\n\nintegrations.spectrum.args\nModule for handling Spectrum input arguments.\n\n\n\n\n\n\nCommon utilities and shared functionality\n\n\n\ncommon.architectures\nCommon architecture specific constants\n\n\ncommon.const\nVarious shared constants\n\n\ncommon.datasets\nDataset loading utilities.\n\n\n\n\n\n\nCustom model implementations\n\n\n\nmodels.mamba.modeling_mamba\n\n\n\n\n\n\n\nData processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler\n\n\n\n\n\n\nTraining callbacks\n\n\n\nutils.callbacks.perplexity\ncallback to calculate perplexity as an evaluation metric.\n\n\nutils.callbacks.profiler\nHF Trainer callback for creating pytorch profiling snapshots\n\n\nutils.callbacks.lisa\nmodule for LISA\n\n\nutils.callbacks.mlflow_\nMLFlow module for trainer callbacks\n\n\nutils.callbacks.comet_\nComet module for trainer callbacks" }, { "objectID": "docs/api/index.html#core", @@ -743,7 +743,7 @@ "href": "docs/api/index.html#data-processing", "title": "API Reference", "section": "", - "text": "Data processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences. Also\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler" + "text": "Data processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler" }, { "objectID": "docs/api/index.html#callbacks", @@ -2794,14 +2794,14 @@ "href": "docs/api/utils.collators.batching.html", "title": "utils.collators.batching", "section": "", - "text": "utils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences. Also\nincludes logic for handling sequence parallelism collation.\n\n\n\n\n\nName\nDescription\n\n\n\n\nBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nDataCollatorForSeq2Seq\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\nPretrainingBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nV2BatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\n\n\n\nutils.collators.batching.BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n sequence_parallel_degree=1,\n ring_attn_func=None,\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.DataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n sequence_parallel_degree=1,\n ring_attn_func=None,\n)\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ntokenizer\n[PreTrainedTokenizer] or [PreTrainedTokenizerFast]\nThe tokenizer used for encoding the data.\nrequired\n\n\nmodel\n[PreTrainedModel]\nThe model that is being trained. If set and has the prepare_decoder_input_ids_from_labels, use it to prepare the decoder_input_ids This is useful when using label_smoothing to avoid calculating loss twice.\nNone\n\n\npadding\nbool, str or [~utils.PaddingStrategy], optional, defaults to True\nSelect a strategy to pad the returned sequences (according to the model’s padding side and padding index) among: - True or 'longest' (default): Pad to the longest sequence in the batch (or no padding if only a single sequence is provided). - 'max_length': Pad to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided. - False or 'do_not_pad': No padding (i.e., can output a batch with sequences of different lengths).\nTrue\n\n\nmax_length\nint, optional\nMaximum length of the returned list and optionally padding length (see above).\nNone\n\n\npad_to_multiple_of\nint, optional\nIf set will pad the sequence to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).\nNone\n\n\nlabel_pad_token_id\nint, optional, defaults to -100\nThe id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions).\n-100\n\n\nreturn_tensors\nstr\nThe type of Tensor to return. Allowable values are “np”, “pt” and “tf”.\n'pt'\n\n\nsequence_parallel_degree\nint\nThe degree of sequence parallelism. Default to 1 for no sequence parallelism.\n1\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_sequence_parallelism\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\nutils.collators.batching.DataCollatorForSeq2Seq.apply_sequence_parallelism(\n batch,\n)\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\ndict[str, torch.Tensor]\nBatch dictionary from parent collator.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nSliced batch dictionary.\n\n\n\n\n\n\n\n\n\nutils.collators.batching.PretrainingBatchSamplerDataCollatorForSeq2Seq(\n self,\n *args,\n multipack_attn=True,\n **kwargs,\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.V2BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n sequence_parallel_degree=1,\n ring_attn_func=None,\n)\nCollator for multipack specific to the using the BatchSampler" + "text": "utils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences\n\n\n\n\n\nName\nDescription\n\n\n\n\nBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nDataCollatorForSeq2Seq\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\nPretrainingBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nV2BatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\n\n\n\nutils.collators.batching.BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.DataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n)\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ntokenizer\n[PreTrainedTokenizer] or [PreTrainedTokenizerFast]\nThe tokenizer used for encoding the data.\nrequired\n\n\nmodel\n[PreTrainedModel]\nThe model that is being trained. If set and has the prepare_decoder_input_ids_from_labels, use it to prepare the decoder_input_ids This is useful when using label_smoothing to avoid calculating loss twice.\nNone\n\n\npadding\nbool, str or [~utils.PaddingStrategy], optional, defaults to True\nSelect a strategy to pad the returned sequences (according to the model’s padding side and padding index) among: - True or 'longest' (default): Pad to the longest sequence in the batch (or no padding if only a single sequence is provided). - 'max_length': Pad to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided. - False or 'do_not_pad': No padding (i.e., can output a batch with sequences of different lengths).\nTrue\n\n\nmax_length\nint, optional\nMaximum length of the returned list and optionally padding length (see above).\nNone\n\n\npad_to_multiple_of\nint, optional\nIf set will pad the sequence to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).\nNone\n\n\nlabel_pad_token_id\nint, optional, defaults to -100\nThe id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions).\n-100\n\n\nreturn_tensors\nstr\nThe type of Tensor to return. Allowable values are “np”, “pt” and “tf”.\n'pt'\n\n\n\n\n\n\n\nutils.collators.batching.PretrainingBatchSamplerDataCollatorForSeq2Seq(\n self,\n *args,\n multipack_attn=True,\n **kwargs,\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.V2BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n)\nCollator for multipack specific to the using the BatchSampler" }, { "objectID": "docs/api/utils.collators.batching.html#classes", "href": "docs/api/utils.collators.batching.html#classes", "title": "utils.collators.batching", "section": "", - "text": "Name\nDescription\n\n\n\n\nBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nDataCollatorForSeq2Seq\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\nPretrainingBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nV2BatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\n\n\n\nutils.collators.batching.BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n sequence_parallel_degree=1,\n ring_attn_func=None,\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.DataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n sequence_parallel_degree=1,\n ring_attn_func=None,\n)\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ntokenizer\n[PreTrainedTokenizer] or [PreTrainedTokenizerFast]\nThe tokenizer used for encoding the data.\nrequired\n\n\nmodel\n[PreTrainedModel]\nThe model that is being trained. If set and has the prepare_decoder_input_ids_from_labels, use it to prepare the decoder_input_ids This is useful when using label_smoothing to avoid calculating loss twice.\nNone\n\n\npadding\nbool, str or [~utils.PaddingStrategy], optional, defaults to True\nSelect a strategy to pad the returned sequences (according to the model’s padding side and padding index) among: - True or 'longest' (default): Pad to the longest sequence in the batch (or no padding if only a single sequence is provided). - 'max_length': Pad to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided. - False or 'do_not_pad': No padding (i.e., can output a batch with sequences of different lengths).\nTrue\n\n\nmax_length\nint, optional\nMaximum length of the returned list and optionally padding length (see above).\nNone\n\n\npad_to_multiple_of\nint, optional\nIf set will pad the sequence to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).\nNone\n\n\nlabel_pad_token_id\nint, optional, defaults to -100\nThe id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions).\n-100\n\n\nreturn_tensors\nstr\nThe type of Tensor to return. Allowable values are “np”, “pt” and “tf”.\n'pt'\n\n\nsequence_parallel_degree\nint\nThe degree of sequence parallelism. Default to 1 for no sequence parallelism.\n1\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\napply_sequence_parallelism\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\nutils.collators.batching.DataCollatorForSeq2Seq.apply_sequence_parallelism(\n batch,\n)\nApply sequence parallelism slicing to a batch.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\ndict[str, torch.Tensor]\nBatch dictionary from parent collator.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntorch.Tensor\nSliced batch dictionary.\n\n\n\n\n\n\n\n\n\nutils.collators.batching.PretrainingBatchSamplerDataCollatorForSeq2Seq(\n self,\n *args,\n multipack_attn=True,\n **kwargs,\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.V2BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n sequence_parallel_degree=1,\n ring_attn_func=None,\n)\nCollator for multipack specific to the using the BatchSampler" + "text": "Name\nDescription\n\n\n\n\nBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nDataCollatorForSeq2Seq\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\nPretrainingBatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\nV2BatchSamplerDataCollatorForSeq2Seq\nCollator for multipack specific to the using the BatchSampler\n\n\n\n\n\nutils.collators.batching.BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.DataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n)\nData collator that will dynamically pad the inputs received, as well as the labels and position_ids\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ntokenizer\n[PreTrainedTokenizer] or [PreTrainedTokenizerFast]\nThe tokenizer used for encoding the data.\nrequired\n\n\nmodel\n[PreTrainedModel]\nThe model that is being trained. If set and has the prepare_decoder_input_ids_from_labels, use it to prepare the decoder_input_ids This is useful when using label_smoothing to avoid calculating loss twice.\nNone\n\n\npadding\nbool, str or [~utils.PaddingStrategy], optional, defaults to True\nSelect a strategy to pad the returned sequences (according to the model’s padding side and padding index) among: - True or 'longest' (default): Pad to the longest sequence in the batch (or no padding if only a single sequence is provided). - 'max_length': Pad to a maximum length specified with the argument max_length or to the maximum acceptable input length for the model if that argument is not provided. - False or 'do_not_pad': No padding (i.e., can output a batch with sequences of different lengths).\nTrue\n\n\nmax_length\nint, optional\nMaximum length of the returned list and optionally padding length (see above).\nNone\n\n\npad_to_multiple_of\nint, optional\nIf set will pad the sequence to a multiple of the provided value. This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).\nNone\n\n\nlabel_pad_token_id\nint, optional, defaults to -100\nThe id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions).\n-100\n\n\nreturn_tensors\nstr\nThe type of Tensor to return. Allowable values are “np”, “pt” and “tf”.\n'pt'\n\n\n\n\n\n\n\nutils.collators.batching.PretrainingBatchSamplerDataCollatorForSeq2Seq(\n self,\n *args,\n multipack_attn=True,\n **kwargs,\n)\nCollator for multipack specific to the using the BatchSampler\n\n\n\nutils.collators.batching.V2BatchSamplerDataCollatorForSeq2Seq(\n self,\n tokenizer,\n model=None,\n padding=True,\n max_length=None,\n pad_to_multiple_of=None,\n label_pad_token_id=-100,\n position_pad_token_id=0,\n return_tensors='pt',\n)\nCollator for multipack specific to the using the BatchSampler" }, { "objectID": "docs/api/utils.schemas.datasets.html", diff --git a/sitemap.xml b/sitemap.xml index 6e1d6284e..d7cb62dde 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,682 +2,682 @@ https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-04-24T17:01:57.365Z + 2025-04-25T14:34:06.942Z https://docs.axolotl.ai/index.html - 2025-04-24T17:01:57.377Z + 2025-04-25T14:34:06.954Z https://docs.axolotl.ai/docs/rlhf.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.942Z https://docs.axolotl.ai/docs/unsloth.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.942Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/input_output.html - 2025-04-24T17:01:57.363Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-04-24T17:02:27.939Z + 2025-04-25T14:34:36.900Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-04-24T17:02:27.720Z + 2025-04-25T14:34:36.684Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-04-24T17:02:27.231Z + 2025-04-25T14:34:36.191Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-04-24T17:02:27.622Z + 2025-04-25T14:34:36.586Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-04-24T17:02:27.066Z + 2025-04-25T14:34:36.026Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-04-24T17:02:27.506Z + 2025-04-25T14:34:36.468Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-04-24T17:02:26.943Z + 2025-04-25T14:34:35.902Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-04-24T17:02:27.792Z + 2025-04-25T14:34:36.757Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-04-24T17:02:27.647Z + 2025-04-25T14:34:36.611Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-04-24T17:02:27.223Z + 2025-04-25T14:34:36.183Z https://docs.axolotl.ai/docs/api/monkeypatch.attention.mllama.html - 2025-04-24T17:02:27.574Z + 2025-04-25T14:34:36.536Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-04-24T17:02:27.740Z + 2025-04-25T14:34:36.705Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-04-24T17:02:26.956Z + 2025-04-25T14:34:35.915Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-04-24T17:02:27.549Z + 2025-04-25T14:34:36.511Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-04-24T17:02:27.910Z + 2025-04-25T14:34:36.876Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-04-24T17:02:26.948Z + 2025-04-25T14:34:35.907Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-04-24T17:02:27.636Z + 2025-04-25T14:34:36.599Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-04-24T17:02:27.550Z + 2025-04-25T14:34:36.513Z https://docs.axolotl.ai/docs/api/utils.lora_embeddings.html - 2025-04-24T17:02:27.630Z + 2025-04-25T14:34:36.594Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-04-24T17:02:27.772Z + 2025-04-25T14:34:36.736Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-04-24T17:02:26.942Z + 2025-04-25T14:34:35.900Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-04-24T17:02:27.504Z + 2025-04-25T14:34:36.466Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-04-24T17:02:27.284Z + 2025-04-25T14:34:36.244Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-04-24T17:02:27.307Z + 2025-04-25T14:34:36.268Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-04-24T17:02:27.709Z + 2025-04-25T14:34:36.673Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-04-24T17:02:27.278Z + 2025-04-25T14:34:36.238Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-04-24T17:02:27.712Z + 2025-04-25T14:34:36.676Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-04-24T17:02:27.488Z + 2025-04-25T14:34:36.450Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-04-24T17:02:27.182Z + 2025-04-25T14:34:36.141Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-04-24T17:02:27.575Z + 2025-04-25T14:34:36.538Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-04-24T17:02:27.453Z + 2025-04-25T14:34:36.415Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-04-24T17:02:26.939Z + 2025-04-25T14:34:35.897Z https://docs.axolotl.ai/docs/api/index.html - 2025-04-24T17:02:26.676Z + 2025-04-25T14:34:35.631Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-04-24T17:02:27.306Z + 2025-04-25T14:34:36.266Z https://docs.axolotl.ai/docs/api/convert.html - 2025-04-24T17:02:26.768Z + 2025-04-25T14:34:35.723Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-04-24T17:02:27.780Z + 2025-04-25T14:34:36.744Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-04-24T17:02:27.153Z + 2025-04-25T14:34:36.112Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-04-24T17:02:26.747Z + 2025-04-25T14:34:35.702Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-04-24T17:02:27.490Z + 2025-04-25T14:34:36.452Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-04-24T17:02:27.461Z + 2025-04-25T14:34:36.423Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-04-24T17:02:27.968Z + 2025-04-25T14:34:36.929Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-04-24T17:02:27.962Z + 2025-04-25T14:34:36.924Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-04-24T17:02:27.177Z + 2025-04-25T14:34:36.136Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-04-24T17:02:27.130Z + 2025-04-25T14:34:36.088Z https://docs.axolotl.ai/docs/api/train.html - 2025-04-24T17:02:26.737Z + 2025-04-25T14:34:35.692Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-04-24T17:02:27.304Z + 2025-04-25T14:34:36.265Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-04-24T17:02:27.775Z + 2025-04-25T14:34:36.739Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-04-24T17:02:27.443Z + 2025-04-25T14:34:36.405Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-04-24T17:02:27.639Z + 2025-04-25T14:34:36.603Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-04-24T17:02:27.565Z + 2025-04-25T14:34:36.528Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-04-24T17:02:27.883Z + 2025-04-25T14:34:36.849Z https://docs.axolotl.ai/docs/api/core.trainer_builder.html - 2025-04-24T17:02:26.830Z + 2025-04-25T14:34:35.785Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-04-24T17:02:27.805Z + 2025-04-25T14:34:36.769Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-04-24T17:02:27.433Z + 2025-04-25T14:34:36.395Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-04-24T17:02:27.350Z + 2025-04-25T14:34:36.311Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-04-24T17:02:27.180Z + 2025-04-25T14:34:36.139Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-04-24T17:02:27.282Z + 2025-04-25T14:34:36.243Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-04-24T17:02:27.612Z + 2025-04-25T14:34:36.576Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-04-24T17:02:27.196Z + 2025-04-25T14:34:36.156Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-04-24T17:02:27.260Z + 2025-04-25T14:34:36.221Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-04-24T17:02:27.952Z + 2025-04-25T14:34:36.913Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-04-24T17:02:27.020Z + 2025-04-25T14:34:35.979Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-04-24T17:02:27.959Z + 2025-04-25T14:34:36.920Z https://docs.axolotl.ai/docs/api/utils.gradient_checkpointing.unsloth.html - 2025-04-24T17:02:27.726Z + 2025-04-25T14:34:36.690Z https://docs.axolotl.ai/docs/mac.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/config.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/multimodal.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/multi-node.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.942Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.937Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.937Z https://docs.axolotl.ai/docs/faq.html - 2025-04-24T17:01:57.361Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-04-24T17:01:57.380Z + 2025-04-25T14:34:06.958Z https://docs.axolotl.ai/TODO.html - 2025-04-24T17:01:57.359Z + 2025-04-25T14:34:06.936Z https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-04-24T17:01:57.380Z + 2025-04-25T14:34:06.958Z https://docs.axolotl.ai/docs/getting-started.html - 2025-04-24T17:01:57.361Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/multipack.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/installation.html - 2025-04-24T17:01:57.363Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/cli.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.937Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-04-24T17:01:57.360Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.942Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/nccl.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-04-24T17:02:26.815Z + 2025-04-25T14:34:35.770Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-04-24T17:02:27.664Z + 2025-04-25T14:34:36.628Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-04-24T17:02:27.567Z + 2025-04-25T14:34:36.530Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-04-24T17:02:27.003Z + 2025-04-25T14:34:35.962Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-04-24T17:02:27.462Z + 2025-04-25T14:34:36.424Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-04-24T17:02:26.754Z + 2025-04-25T14:34:35.710Z https://docs.axolotl.ai/docs/api/utils.models.html - 2025-04-24T17:02:27.605Z + 2025-04-25T14:34:36.569Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-04-24T17:02:27.316Z + 2025-04-25T14:34:36.277Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-04-24T17:02:27.088Z + 2025-04-25T14:34:36.046Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-04-24T17:02:27.080Z + 2025-04-25T14:34:36.037Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-04-24T17:02:27.868Z + 2025-04-25T14:34:36.834Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-04-24T17:02:27.346Z + 2025-04-25T14:34:36.308Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-04-24T17:02:27.799Z + 2025-04-25T14:34:36.764Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-04-24T17:02:27.971Z + 2025-04-25T14:34:36.933Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-04-24T17:02:27.256Z + 2025-04-25T14:34:36.216Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-04-24T17:02:27.688Z + 2025-04-25T14:34:36.653Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-04-24T17:02:27.133Z + 2025-04-25T14:34:36.092Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-04-24T17:02:27.125Z + 2025-04-25T14:34:36.084Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-04-24T17:02:27.539Z + 2025-04-25T14:34:36.502Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-04-24T17:02:27.250Z + 2025-04-25T14:34:36.210Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-04-24T17:02:27.734Z + 2025-04-25T14:34:36.698Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-04-24T17:02:27.514Z + 2025-04-25T14:34:36.476Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-04-24T17:02:27.044Z + 2025-04-25T14:34:36.003Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-04-24T17:02:27.911Z + 2025-04-25T14:34:36.878Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-04-24T17:02:27.722Z + 2025-04-25T14:34:36.687Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-04-24T17:02:27.890Z + 2025-04-25T14:34:36.856Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-04-24T17:02:27.294Z + 2025-04-25T14:34:36.254Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-04-24T17:02:27.058Z + 2025-04-25T14:34:36.017Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-04-24T17:02:27.211Z + 2025-04-25T14:34:36.171Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-04-24T17:02:27.936Z + 2025-04-25T14:34:36.897Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-04-24T17:02:27.763Z + 2025-04-25T14:34:36.727Z https://docs.axolotl.ai/docs/api/utils.data.pretraining.html - 2025-04-24T17:02:27.721Z + 2025-04-25T14:34:36.685Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-04-24T17:02:27.308Z + 2025-04-25T14:34:36.269Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-04-24T17:02:27.547Z + 2025-04-25T14:34:36.510Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-04-24T17:02:27.964Z + 2025-04-25T14:34:36.925Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-04-24T17:02:27.210Z + 2025-04-25T14:34:36.169Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-04-24T17:02:27.326Z + 2025-04-25T14:34:36.287Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-04-24T17:02:26.916Z + 2025-04-25T14:34:35.874Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-04-24T17:02:27.944Z + 2025-04-25T14:34:36.905Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-04-24T17:02:27.891Z + 2025-04-25T14:34:36.857Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-04-24T17:02:27.880Z + 2025-04-25T14:34:36.846Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-04-24T17:02:27.244Z + 2025-04-25T14:34:36.204Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-04-24T17:02:27.170Z + 2025-04-25T14:34:36.129Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-04-24T17:02:27.271Z + 2025-04-25T14:34:36.231Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-04-24T17:02:27.871Z + 2025-04-25T14:34:36.837Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-04-24T17:02:27.627Z + 2025-04-25T14:34:36.591Z https://docs.axolotl.ai/docs/api/cli.sweeps.html - 2025-04-24T17:02:27.094Z + 2025-04-25T14:34:36.052Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-04-24T17:02:26.940Z + 2025-04-25T14:34:35.899Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-04-24T17:02:27.893Z + 2025-04-25T14:34:36.859Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-04-24T17:02:27.267Z + 2025-04-25T14:34:36.228Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-04-24T17:02:27.556Z + 2025-04-25T14:34:36.518Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-04-24T17:02:27.872Z + 2025-04-25T14:34:36.838Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-04-24T17:02:27.027Z + 2025-04-25T14:34:35.986Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-04-24T17:02:27.909Z + 2025-04-25T14:34:36.875Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-04-24T17:02:27.886Z + 2025-04-25T14:34:36.852Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-04-24T17:02:27.577Z + 2025-04-25T14:34:36.539Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-04-24T17:02:27.325Z + 2025-04-25T14:34:36.285Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-04-24T17:02:26.995Z + 2025-04-25T14:34:35.954Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-04-24T17:02:26.809Z + 2025-04-25T14:34:35.765Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-04-24T17:02:27.745Z + 2025-04-25T14:34:36.710Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-04-24T17:02:27.512Z + 2025-04-25T14:34:36.475Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-04-24T17:02:27.139Z + 2025-04-25T14:34:36.098Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-04-24T17:02:26.987Z + 2025-04-25T14:34:35.946Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-04-24T17:02:27.559Z + 2025-04-25T14:34:36.522Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-04-24T17:01:57.361Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/debugging.html - 2025-04-24T17:01:57.361Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/docker.html - 2025-04-24T17:01:57.361Z + 2025-04-25T14:34:06.938Z https://docs.axolotl.ai/docs/inference.html - 2025-04-24T17:01:57.363Z + 2025-04-25T14:34:06.941Z https://docs.axolotl.ai/docs/torchao.html - 2025-04-24T17:01:57.364Z + 2025-04-25T14:34:06.942Z https://docs.axolotl.ai/FAQS.html - 2025-04-24T17:01:57.358Z + 2025-04-25T14:34:06.936Z