diff --git a/.nojekyll b/.nojekyll
index c33857a59..b09b3b21c 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-d141c94d
\ No newline at end of file
+9780cd52
\ No newline at end of file
diff --git a/docs/api/index.html b/docs/api/index.html
index 44f765cca..b60bfad09 100644
--- a/docs/api/index.html
+++ b/docs/api/index.html
@@ -697,7 +697,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
| loaders.model |
-Model loader class implementation for loading, configuring, and patching various |
+Model loader class implementation for loading, configuring, and patching various models. |
| loaders.tokenizer |
diff --git a/docs/api/loaders.model.html b/docs/api/loaders.model.html
index 04906632c..c60df315d 100644
--- a/docs/api/loaders.model.html
+++ b/docs/api/loaders.model.html
@@ -510,8 +510,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
loaders.model
loaders.model
-Model loader class implementation for loading, configuring, and patching various
-models.
+Model loader class implementation for loading, configuring, and patching various models.
Classes
diff --git a/search.json b/search.json
index 5b1a7bac6..0641d7fcd 100644
--- a/search.json
+++ b/search.json
@@ -851,7 +851,7 @@
"href": "docs/api/index.html",
"title": "API Reference",
"section": "",
- "text": "Core functionality for training\n\n\n\ntrain\nPrepare and train a model on a dataset. Can also infer from a model or merge lora\n\n\nevaluate\nModule for evaluating models.\n\n\ndatasets\nModule containing Dataset functionality\n\n\nconvert\nModule containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes\n\n\nprompt_tokenizers\nModule containing PromptTokenizingStrategy and Prompter classes\n\n\nlogging_config\nCommon logging module for axolotl\n\n\ncore.builders.base\nBase class for trainer builder\n\n\ncore.builders.causal\nBuilder for causal trainers\n\n\ncore.builders.rl\nBuilder for RLHF trainers\n\n\ncore.training_args\nextra axolotl specific training args\n\n\ncore.chat.messages\ninternal message representations of chat messages\n\n\ncore.chat.format.chatml\nChatML transformation functions for MessageContents\n\n\ncore.chat.format.llama3x\nLlama 3.x chat formatting functions for MessageContents\n\n\ncore.chat.format.shared\nshared functions for format transforms\n\n\ncore.datasets.chat\nchat dataset module\n\n\ncore.datasets.transforms.chat_builder\nThis module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.\n\n\n\n\n\n\nCommand-line interface\n\n\n\ncli.main\nClick CLI definitions for various axolotl commands.\n\n\ncli.train\nCLI to run training on a model.\n\n\ncli.evaluate\nCLI to run evaluation on a model.\n\n\ncli.args\nModule for axolotl CLI command arguments.\n\n\ncli.art\nAxolotl ASCII logo utils.\n\n\ncli.checks\nVarious checks for Axolotl CLI.\n\n\ncli.config\nConfiguration loading and processing.\n\n\ncli.delinearize_llama4\nCLI tool to delinearize quantized/Linearized Llama-4 models.\n\n\ncli.inference\nCLI to run inference on a trained model.\n\n\ncli.merge_lora\nCLI to merge a trained LoRA into a base model.\n\n\ncli.merge_sharded_fsdp_weights\nCLI to merge sharded FSDP model checkpoints into a single combined checkpoint.\n\n\ncli.preprocess\nCLI to run preprocessing of a dataset.\n\n\ncli.quantize\nCLI to post-training quantize a model using torchao\n\n\ncli.vllm_serve\nCLI to start the vllm server for online RL\n\n\ncli.cloud.base\nbase class for cloud platforms from cli\n\n\ncli.cloud.modal_\nModal Cloud support from CLI\n\n\ncli.utils\nInit for axolotl.cli.utils module.\n\n\ncli.utils.args\nUtilities for axolotl CLI args.\n\n\ncli.utils.fetch\nUtilities for axolotl fetch CLI command.\n\n\ncli.utils.load\nUtilities for model, tokenizer, etc. loading.\n\n\ncli.utils.sweeps\nUtilities for handling sweeps over configs for axolotl train CLI command\n\n\ncli.utils.train\nUtilities for axolotl train CLI command.\n\n\n\n\n\n\nTraining implementations\n\n\n\ncore.trainers.base\nModule for customized trainers\n\n\ncore.trainers.trl\nModule for TRL RL trainers\n\n\ncore.trainers.mamba\nModule for mamba trainer\n\n\ncore.trainers.dpo.trainer\nDPO trainer for axolotl\n\n\ncore.trainers.grpo.trainer\nAxolotl GRPO trainers (with and without sequence parallelism handling)\n\n\ncore.trainers.grpo.sampler\nRepeat random sampler (similar to the one implemented in\n\n\ncore.trainers.utils\nUtils for Axolotl trainers\n\n\n\n\n\n\nFunctionality for loading and patching models, tokenizers, etc.\n\n\n\nloaders.model\nModel loader class implementation for loading, configuring, and patching various\n\n\nloaders.tokenizer\nTokenizer loading functionality and associated utils\n\n\nloaders.processor\nProcessor loading functionality for multi-modal models\n\n\nloaders.adapter\nAdapter loading functionality, including LoRA / QLoRA and associated utils\n\n\nloaders.patch_manager\nPatch manager class implementation to complement axolotl.loaders.ModelLoader.\n\n\nloaders.constants\nShared constants for axolotl.loaders module\n\n\n\n\n\n\nMixin classes for augmenting trainers\n\n\n\ncore.trainers.mixins.optimizer\nModule for Axolotl trainer optimizer mixin\n\n\ncore.trainers.mixins.rng_state_loader\nTemporary fix/override for bug in resume from checkpoint\n\n\ncore.trainers.mixins.scheduler\nModule for Axolotl trainer scheduler mixin\n\n\n\n\n\n\nContext managers for altering trainer behaviors\n\n\n\nutils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\n\nPrompt formatting strategies\n\n\n\nprompt_strategies.base\nmodule for base dataset transform strategies\n\n\nprompt_strategies.chat_template\nHF Chat Templates prompt strategy\n\n\nprompt_strategies.alpaca_chat\nModule for Alpaca prompt strategy classes\n\n\nprompt_strategies.alpaca_instruct\nModule loading the AlpacaInstructPromptTokenizingStrategy class\n\n\nprompt_strategies.alpaca_w_system\nPrompt strategies loader for alpaca instruction datasets with system prompts\n\n\nprompt_strategies.user_defined\nUser Defined prompts with configuration from the YML config\n\n\nprompt_strategies.llama2_chat\nPrompt Strategy for finetuning Llama2 chat models\n\n\nprompt_strategies.completion\nBasic completion text\n\n\nprompt_strategies.input_output\nModule for plain input/output prompt pairs\n\n\nprompt_strategies.stepwise_supervised\nModule for stepwise datasets, typically including a prompt and reasoning traces,\n\n\nprompt_strategies.metharme\nModule containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class\n\n\nprompt_strategies.orcamini\nPrompt Strategy for finetuning Orca Mini (v2) models\n\n\nprompt_strategies.pygmalion\nModule containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class\n\n\nprompt_strategies.messages.chat\nChat dataset wrapping strategy for new internal messages representations\n\n\nprompt_strategies.dpo.chat_template\nDPO prompt strategies for using tokenizer chat templates.\n\n\nprompt_strategies.dpo.llama3\nDPO strategies for llama-3 chat template\n\n\nprompt_strategies.dpo.chatml\nDPO strategies for chatml\n\n\nprompt_strategies.dpo.zephyr\nDPO strategies for zephyr\n\n\nprompt_strategies.dpo.user_defined\nUser-defined DPO strategies\n\n\nprompt_strategies.dpo.passthrough\nDPO prompt strategies passthrough/zero-processing strategy\n\n\nprompt_strategies.kto.llama3\nKTO strategies for llama-3 chat template\n\n\nprompt_strategies.kto.chatml\nKTO strategies for chatml\n\n\nprompt_strategies.kto.user_defined\nUser-defined KTO strategies\n\n\nprompt_strategies.orpo.chat_template\nchatml prompt tokenization strategy for ORPO\n\n\nprompt_strategies.bradley_terry.llama3\nchatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template\n\n\n\n\n\n\nLow-level performance optimizations\n\n\n\nkernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\n\n\nkernels.geglu\nModule for definition of GEGLU Triton kernels.\n\n\nkernels.swiglu\nModule for definition of SwiGLU Triton kernels.\n\n\nkernels.quantize\nDequantization utilities for bitsandbytes integration.\n\n\nkernels.utils\nUtilities for axolotl.kernels submodules.\n\n\n\n\n\n\nRuntime patches for model optimizations\n\n\n\nmonkeypatch.llama_attn_hijack_flash\nFlash attention monkey patch for llama model\n\n\nmonkeypatch.llama_attn_hijack_xformers\nDirectly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments\n\n\nmonkeypatch.mistral_attn_hijack_flash\nFlash attention monkey patch for mistral model\n\n\nmonkeypatch.multipack\nmultipack patching for v2 of sample packing\n\n\nmonkeypatch.relora\nImplements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.\n\n\nmonkeypatch.llama_expand_mask\nexpands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf\n\n\nmonkeypatch.lora_kernels\nModule for patching custom LoRA Triton kernels and torch.autograd functions.\n\n\nmonkeypatch.utils\nShared utils for the monkeypatches\n\n\nmonkeypatch.btlm_attn_hijack_flash\nFlash attention monkey patch for cerebras btlm model\n\n\nmonkeypatch.llama_patch_multipack\nPatched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention\n\n\nmonkeypatch.stablelm_attn_hijack_flash\nPyTorch StableLM Epoch model.\n\n\nmonkeypatch.trainer_fsdp_optim\nfix for FSDP optimizer save in trainer w 4.47.0\n\n\nmonkeypatch.transformers_fa_utils\nsee https://github.com/huggingface/transformers/pull/35834\n\n\nmonkeypatch.unsloth_\nmodule for patching with unsloth optimizations\n\n\nmonkeypatch.data.batch_dataset_fetcher\nmonkey patches for the dataset fetcher to handle batches of packed indexes\n\n\nmonkeypatch.mixtral\nPatches to support multipack for mixtral\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu\nCPU offloaded checkpointing\n\n\nmonkeypatch.gradient_checkpointing.offload_disk\nDISCO - DIsk-based Storage and Checkpointing with Optimized prefetching\n\n\n\n\n\n\nUtility functions\n\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nUtilities for distributed functionality.\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\nData handling specific to SFT.\n\n\nutils.quantization\nUtilities for quantization including QAT and PTQ using torchao.\n\n\n\n\n\n\nPydantic data models for Axolotl config\n\n\n\nutils.schemas.config\nModule with Pydantic models for configuration.\n\n\nutils.schemas.model\nPydantic models for model input / output, etc. configuration\n\n\nutils.schemas.training\nPydantic models for training hyperparameters\n\n\nutils.schemas.datasets\nPydantic models for datasets-related configuration\n\n\nutils.schemas.peft\nPydantic models for PEFT-related configuration\n\n\nutils.schemas.trl\nPydantic models for TRL trainer configuration\n\n\nutils.schemas.multimodal\nPydantic models for multimodal-related configuration\n\n\nutils.schemas.integrations\nPydantic models for Axolotl integrations\n\n\nutils.schemas.enums\nEnums for Axolotl input config\n\n\nutils.schemas.utils\nUtilities for Axolotl Pydantic models\n\n\n\n\n\n\nThird-party integrations and extensions\n\n\n\nintegrations.base\nBase class for all plugins.\n\n\nintegrations.cut_cross_entropy.args\nModule for handling Cut Cross Entropy input arguments.\n\n\nintegrations.grokfast.optimizer\n\n\n\nintegrations.kd.trainer\nKD trainer\n\n\nintegrations.liger.args\nModule for handling LIGER input arguments.\n\n\nintegrations.lm_eval.args\nModule for handling lm eval harness input arguments.\n\n\nintegrations.spectrum.args\nModule for handling Spectrum input arguments.\n\n\n\n\n\n\nCommon utilities and shared functionality\n\n\n\ncommon.architectures\nCommon architecture specific constants\n\n\ncommon.const\nVarious shared constants\n\n\ncommon.datasets\nDataset loading utilities.\n\n\n\n\n\n\nCustom model implementations\n\n\n\nmodels.mamba.modeling_mamba\n\n\n\n\n\n\n\nData processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler - An efficient batch sampler for packing variable-length sequences\n\n\n\n\n\n\nTraining callbacks\n\n\n\nutils.callbacks.perplexity\ncallback to calculate perplexity as an evaluation metric.\n\n\nutils.callbacks.profiler\nHF Trainer callback for creating pytorch profiling snapshots\n\n\nutils.callbacks.lisa\nmodule for LISA\n\n\nutils.callbacks.mlflow_\nMLFlow module for trainer callbacks\n\n\nutils.callbacks.comet_\nComet module for trainer callbacks\n\n\nutils.callbacks.qat\nQAT Callback for HF Causal Trainer"
+ "text": "Core functionality for training\n\n\n\ntrain\nPrepare and train a model on a dataset. Can also infer from a model or merge lora\n\n\nevaluate\nModule for evaluating models.\n\n\ndatasets\nModule containing Dataset functionality\n\n\nconvert\nModule containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes\n\n\nprompt_tokenizers\nModule containing PromptTokenizingStrategy and Prompter classes\n\n\nlogging_config\nCommon logging module for axolotl\n\n\ncore.builders.base\nBase class for trainer builder\n\n\ncore.builders.causal\nBuilder for causal trainers\n\n\ncore.builders.rl\nBuilder for RLHF trainers\n\n\ncore.training_args\nextra axolotl specific training args\n\n\ncore.chat.messages\ninternal message representations of chat messages\n\n\ncore.chat.format.chatml\nChatML transformation functions for MessageContents\n\n\ncore.chat.format.llama3x\nLlama 3.x chat formatting functions for MessageContents\n\n\ncore.chat.format.shared\nshared functions for format transforms\n\n\ncore.datasets.chat\nchat dataset module\n\n\ncore.datasets.transforms.chat_builder\nThis module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.\n\n\n\n\n\n\nCommand-line interface\n\n\n\ncli.main\nClick CLI definitions for various axolotl commands.\n\n\ncli.train\nCLI to run training on a model.\n\n\ncli.evaluate\nCLI to run evaluation on a model.\n\n\ncli.args\nModule for axolotl CLI command arguments.\n\n\ncli.art\nAxolotl ASCII logo utils.\n\n\ncli.checks\nVarious checks for Axolotl CLI.\n\n\ncli.config\nConfiguration loading and processing.\n\n\ncli.delinearize_llama4\nCLI tool to delinearize quantized/Linearized Llama-4 models.\n\n\ncli.inference\nCLI to run inference on a trained model.\n\n\ncli.merge_lora\nCLI to merge a trained LoRA into a base model.\n\n\ncli.merge_sharded_fsdp_weights\nCLI to merge sharded FSDP model checkpoints into a single combined checkpoint.\n\n\ncli.preprocess\nCLI to run preprocessing of a dataset.\n\n\ncli.quantize\nCLI to post-training quantize a model using torchao\n\n\ncli.vllm_serve\nCLI to start the vllm server for online RL\n\n\ncli.cloud.base\nbase class for cloud platforms from cli\n\n\ncli.cloud.modal_\nModal Cloud support from CLI\n\n\ncli.utils\nInit for axolotl.cli.utils module.\n\n\ncli.utils.args\nUtilities for axolotl CLI args.\n\n\ncli.utils.fetch\nUtilities for axolotl fetch CLI command.\n\n\ncli.utils.load\nUtilities for model, tokenizer, etc. loading.\n\n\ncli.utils.sweeps\nUtilities for handling sweeps over configs for axolotl train CLI command\n\n\ncli.utils.train\nUtilities for axolotl train CLI command.\n\n\n\n\n\n\nTraining implementations\n\n\n\ncore.trainers.base\nModule for customized trainers\n\n\ncore.trainers.trl\nModule for TRL RL trainers\n\n\ncore.trainers.mamba\nModule for mamba trainer\n\n\ncore.trainers.dpo.trainer\nDPO trainer for axolotl\n\n\ncore.trainers.grpo.trainer\nAxolotl GRPO trainers (with and without sequence parallelism handling)\n\n\ncore.trainers.grpo.sampler\nRepeat random sampler (similar to the one implemented in\n\n\ncore.trainers.utils\nUtils for Axolotl trainers\n\n\n\n\n\n\nFunctionality for loading and patching models, tokenizers, etc.\n\n\n\nloaders.model\nModel loader class implementation for loading, configuring, and patching various models.\n\n\nloaders.tokenizer\nTokenizer loading functionality and associated utils\n\n\nloaders.processor\nProcessor loading functionality for multi-modal models\n\n\nloaders.adapter\nAdapter loading functionality, including LoRA / QLoRA and associated utils\n\n\nloaders.patch_manager\nPatch manager class implementation to complement axolotl.loaders.ModelLoader.\n\n\nloaders.constants\nShared constants for axolotl.loaders module\n\n\n\n\n\n\nMixin classes for augmenting trainers\n\n\n\ncore.trainers.mixins.optimizer\nModule for Axolotl trainer optimizer mixin\n\n\ncore.trainers.mixins.rng_state_loader\nTemporary fix/override for bug in resume from checkpoint\n\n\ncore.trainers.mixins.scheduler\nModule for Axolotl trainer scheduler mixin\n\n\n\n\n\n\nContext managers for altering trainer behaviors\n\n\n\nutils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\n\nPrompt formatting strategies\n\n\n\nprompt_strategies.base\nmodule for base dataset transform strategies\n\n\nprompt_strategies.chat_template\nHF Chat Templates prompt strategy\n\n\nprompt_strategies.alpaca_chat\nModule for Alpaca prompt strategy classes\n\n\nprompt_strategies.alpaca_instruct\nModule loading the AlpacaInstructPromptTokenizingStrategy class\n\n\nprompt_strategies.alpaca_w_system\nPrompt strategies loader for alpaca instruction datasets with system prompts\n\n\nprompt_strategies.user_defined\nUser Defined prompts with configuration from the YML config\n\n\nprompt_strategies.llama2_chat\nPrompt Strategy for finetuning Llama2 chat models\n\n\nprompt_strategies.completion\nBasic completion text\n\n\nprompt_strategies.input_output\nModule for plain input/output prompt pairs\n\n\nprompt_strategies.stepwise_supervised\nModule for stepwise datasets, typically including a prompt and reasoning traces,\n\n\nprompt_strategies.metharme\nModule containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class\n\n\nprompt_strategies.orcamini\nPrompt Strategy for finetuning Orca Mini (v2) models\n\n\nprompt_strategies.pygmalion\nModule containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class\n\n\nprompt_strategies.messages.chat\nChat dataset wrapping strategy for new internal messages representations\n\n\nprompt_strategies.dpo.chat_template\nDPO prompt strategies for using tokenizer chat templates.\n\n\nprompt_strategies.dpo.llama3\nDPO strategies for llama-3 chat template\n\n\nprompt_strategies.dpo.chatml\nDPO strategies for chatml\n\n\nprompt_strategies.dpo.zephyr\nDPO strategies for zephyr\n\n\nprompt_strategies.dpo.user_defined\nUser-defined DPO strategies\n\n\nprompt_strategies.dpo.passthrough\nDPO prompt strategies passthrough/zero-processing strategy\n\n\nprompt_strategies.kto.llama3\nKTO strategies for llama-3 chat template\n\n\nprompt_strategies.kto.chatml\nKTO strategies for chatml\n\n\nprompt_strategies.kto.user_defined\nUser-defined KTO strategies\n\n\nprompt_strategies.orpo.chat_template\nchatml prompt tokenization strategy for ORPO\n\n\nprompt_strategies.bradley_terry.llama3\nchatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template\n\n\n\n\n\n\nLow-level performance optimizations\n\n\n\nkernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\n\n\nkernels.geglu\nModule for definition of GEGLU Triton kernels.\n\n\nkernels.swiglu\nModule for definition of SwiGLU Triton kernels.\n\n\nkernels.quantize\nDequantization utilities for bitsandbytes integration.\n\n\nkernels.utils\nUtilities for axolotl.kernels submodules.\n\n\n\n\n\n\nRuntime patches for model optimizations\n\n\n\nmonkeypatch.llama_attn_hijack_flash\nFlash attention monkey patch for llama model\n\n\nmonkeypatch.llama_attn_hijack_xformers\nDirectly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments\n\n\nmonkeypatch.mistral_attn_hijack_flash\nFlash attention monkey patch for mistral model\n\n\nmonkeypatch.multipack\nmultipack patching for v2 of sample packing\n\n\nmonkeypatch.relora\nImplements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.\n\n\nmonkeypatch.llama_expand_mask\nexpands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf\n\n\nmonkeypatch.lora_kernels\nModule for patching custom LoRA Triton kernels and torch.autograd functions.\n\n\nmonkeypatch.utils\nShared utils for the monkeypatches\n\n\nmonkeypatch.btlm_attn_hijack_flash\nFlash attention monkey patch for cerebras btlm model\n\n\nmonkeypatch.llama_patch_multipack\nPatched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention\n\n\nmonkeypatch.stablelm_attn_hijack_flash\nPyTorch StableLM Epoch model.\n\n\nmonkeypatch.trainer_fsdp_optim\nfix for FSDP optimizer save in trainer w 4.47.0\n\n\nmonkeypatch.transformers_fa_utils\nsee https://github.com/huggingface/transformers/pull/35834\n\n\nmonkeypatch.unsloth_\nmodule for patching with unsloth optimizations\n\n\nmonkeypatch.data.batch_dataset_fetcher\nmonkey patches for the dataset fetcher to handle batches of packed indexes\n\n\nmonkeypatch.mixtral\nPatches to support multipack for mixtral\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu\nCPU offloaded checkpointing\n\n\nmonkeypatch.gradient_checkpointing.offload_disk\nDISCO - DIsk-based Storage and Checkpointing with Optimized prefetching\n\n\n\n\n\n\nUtility functions\n\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nUtilities for distributed functionality.\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\nData handling specific to SFT.\n\n\nutils.quantization\nUtilities for quantization including QAT and PTQ using torchao.\n\n\n\n\n\n\nPydantic data models for Axolotl config\n\n\n\nutils.schemas.config\nModule with Pydantic models for configuration.\n\n\nutils.schemas.model\nPydantic models for model input / output, etc. configuration\n\n\nutils.schemas.training\nPydantic models for training hyperparameters\n\n\nutils.schemas.datasets\nPydantic models for datasets-related configuration\n\n\nutils.schemas.peft\nPydantic models for PEFT-related configuration\n\n\nutils.schemas.trl\nPydantic models for TRL trainer configuration\n\n\nutils.schemas.multimodal\nPydantic models for multimodal-related configuration\n\n\nutils.schemas.integrations\nPydantic models for Axolotl integrations\n\n\nutils.schemas.enums\nEnums for Axolotl input config\n\n\nutils.schemas.utils\nUtilities for Axolotl Pydantic models\n\n\n\n\n\n\nThird-party integrations and extensions\n\n\n\nintegrations.base\nBase class for all plugins.\n\n\nintegrations.cut_cross_entropy.args\nModule for handling Cut Cross Entropy input arguments.\n\n\nintegrations.grokfast.optimizer\n\n\n\nintegrations.kd.trainer\nKD trainer\n\n\nintegrations.liger.args\nModule for handling LIGER input arguments.\n\n\nintegrations.lm_eval.args\nModule for handling lm eval harness input arguments.\n\n\nintegrations.spectrum.args\nModule for handling Spectrum input arguments.\n\n\n\n\n\n\nCommon utilities and shared functionality\n\n\n\ncommon.architectures\nCommon architecture specific constants\n\n\ncommon.const\nVarious shared constants\n\n\ncommon.datasets\nDataset loading utilities.\n\n\n\n\n\n\nCustom model implementations\n\n\n\nmodels.mamba.modeling_mamba\n\n\n\n\n\n\n\nData processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler - An efficient batch sampler for packing variable-length sequences\n\n\n\n\n\n\nTraining callbacks\n\n\n\nutils.callbacks.perplexity\ncallback to calculate perplexity as an evaluation metric.\n\n\nutils.callbacks.profiler\nHF Trainer callback for creating pytorch profiling snapshots\n\n\nutils.callbacks.lisa\nmodule for LISA\n\n\nutils.callbacks.mlflow_\nMLFlow module for trainer callbacks\n\n\nutils.callbacks.comet_\nComet module for trainer callbacks\n\n\nutils.callbacks.qat\nQAT Callback for HF Causal Trainer"
},
{
"objectID": "docs/api/index.html#core",
@@ -879,7 +879,7 @@
"href": "docs/api/index.html#model-loading",
"title": "API Reference",
"section": "",
- "text": "Functionality for loading and patching models, tokenizers, etc.\n\n\n\nloaders.model\nModel loader class implementation for loading, configuring, and patching various\n\n\nloaders.tokenizer\nTokenizer loading functionality and associated utils\n\n\nloaders.processor\nProcessor loading functionality for multi-modal models\n\n\nloaders.adapter\nAdapter loading functionality, including LoRA / QLoRA and associated utils\n\n\nloaders.patch_manager\nPatch manager class implementation to complement axolotl.loaders.ModelLoader.\n\n\nloaders.constants\nShared constants for axolotl.loaders module"
+ "text": "Functionality for loading and patching models, tokenizers, etc.\n\n\n\nloaders.model\nModel loader class implementation for loading, configuring, and patching various models.\n\n\nloaders.tokenizer\nTokenizer loading functionality and associated utils\n\n\nloaders.processor\nProcessor loading functionality for multi-modal models\n\n\nloaders.adapter\nAdapter loading functionality, including LoRA / QLoRA and associated utils\n\n\nloaders.patch_manager\nPatch manager class implementation to complement axolotl.loaders.ModelLoader.\n\n\nloaders.constants\nShared constants for axolotl.loaders module"
},
{
"objectID": "docs/api/index.html#mixins",
@@ -3081,7 +3081,7 @@
"href": "docs/api/loaders.model.html",
"title": "loaders.model",
"section": "",
- "text": "loaders.model\nModel loader class implementation for loading, configuring, and patching various\nmodels.\n\n\n\n\n\nName\nDescription\n\n\n\n\nModelLoader\nManages model configuration, initialization and application of patches during\n\n\n\n\n\nloaders.model.ModelLoader(\n cfg,\n tokenizer,\n *,\n inference=False,\n reference_model=False,\n **kwargs,\n)\nManages model configuration, initialization and application of patches during\nmodel loading.\nThis class orchestrates the entire process of loading a model from configuration to\nfinal preparation. It handles device mapping, quantization, attention mechanisms,\nadapter integration, and various optimizations.\n\n\n\nLoading and validating model configuration\nApplying monkey patches for optimizations / fixes\nSetting up device mapping (including multi-GPU configurations)\nConfiguring quantization\nSetting attention mechanisms (Flash Attention, SDPA, etc.)\nLoading and initializing the model\nApplying adapters (LoRA, QLoRA, etc.)\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nmodel\nPreTrainedModel | PeftModel | PeftMixedModel\nThe loaded model instance (available after load() is called).\n\n\nmodel_kwargs\ndict[str, Any]\nDictionary of keyword arguments passed to model initialization.\n\n\nbase_model\n\nName or path of the base model to load.\n\n\nmodel_type\n\nType of model to load (e.g., AutoModelForCausalLM).\n\n\nmodel_config\n\nConfiguration object for the model.\n\n\nauto_model_loader\n\nclass used for loading the model (default: AutoModelForCausalLM).\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nload\nLoad and prepare the model with all configurations and patches.\n\n\n\n\n\nloaders.model.ModelLoader.load()\nLoad and prepare the model with all configurations and patches.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[PreTrainedModel | PeftModelForCausalLM, PeftConfig | None]\nA tuple with the loaded model and its LoRA configuration (if applicable)."
+ "text": "loaders.model\nModel loader class implementation for loading, configuring, and patching various models.\n\n\n\n\n\nName\nDescription\n\n\n\n\nModelLoader\nManages model configuration, initialization and application of patches during\n\n\n\n\n\nloaders.model.ModelLoader(\n cfg,\n tokenizer,\n *,\n inference=False,\n reference_model=False,\n **kwargs,\n)\nManages model configuration, initialization and application of patches during\nmodel loading.\nThis class orchestrates the entire process of loading a model from configuration to\nfinal preparation. It handles device mapping, quantization, attention mechanisms,\nadapter integration, and various optimizations.\n\n\n\nLoading and validating model configuration\nApplying monkey patches for optimizations / fixes\nSetting up device mapping (including multi-GPU configurations)\nConfiguring quantization\nSetting attention mechanisms (Flash Attention, SDPA, etc.)\nLoading and initializing the model\nApplying adapters (LoRA, QLoRA, etc.)\n\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nmodel\nPreTrainedModel | PeftModel | PeftMixedModel\nThe loaded model instance (available after load() is called).\n\n\nmodel_kwargs\ndict[str, Any]\nDictionary of keyword arguments passed to model initialization.\n\n\nbase_model\n\nName or path of the base model to load.\n\n\nmodel_type\n\nType of model to load (e.g., AutoModelForCausalLM).\n\n\nmodel_config\n\nConfiguration object for the model.\n\n\nauto_model_loader\n\nclass used for loading the model (default: AutoModelForCausalLM).\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nload\nLoad and prepare the model with all configurations and patches.\n\n\n\n\n\nloaders.model.ModelLoader.load()\nLoad and prepare the model with all configurations and patches.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntuple[PreTrainedModel | PeftModelForCausalLM, PeftConfig | None]\nA tuple with the loaded model and its LoRA configuration (if applicable)."
},
{
"objectID": "docs/api/loaders.model.html#classes",
diff --git a/sitemap.xml b/sitemap.xml
index 374183701..3401980d2 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,794 +2,794 @@
https://docs.axolotl.ai/TODO.html
- 2025-08-08T12:09:22.035Z
+ 2025-08-08T12:15:25.300Z
https://docs.axolotl.ai/index.html
- 2025-08-08T12:09:22.056Z
+ 2025-08-08T12:15:25.321Z
https://docs.axolotl.ai/docs/debugging.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.302Z
https://docs.axolotl.ai/docs/amd_hpc.html
- 2025-08-08T12:09:22.036Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html
- 2025-08-08T12:12:40.369Z
+ 2025-08-08T12:18:41.102Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html
- 2025-08-08T12:12:39.785Z
+ 2025-08-08T12:18:40.512Z
https://docs.axolotl.ai/docs/api/loaders.patch_manager.html
- 2025-08-08T12:12:39.421Z
+ 2025-08-08T12:18:40.141Z
https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html
- 2025-08-08T12:12:39.096Z
+ 2025-08-08T12:18:39.817Z
https://docs.axolotl.ai/docs/api/cli.train.html
- 2025-08-08T12:12:39.154Z
+ 2025-08-08T12:18:39.874Z
https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html
- 2025-08-08T12:12:40.360Z
+ 2025-08-08T12:18:41.093Z
https://docs.axolotl.ai/docs/api/core.chat.messages.html
- 2025-08-08T12:12:39.093Z
+ 2025-08-08T12:18:39.814Z
https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html
- 2025-08-08T12:12:40.365Z
+ 2025-08-08T12:18:41.098Z
https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html
- 2025-08-08T12:12:39.252Z
+ 2025-08-08T12:18:39.972Z
https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html
- 2025-08-08T12:12:39.845Z
+ 2025-08-08T12:18:40.572Z
https://docs.axolotl.ai/docs/api/utils.chat_templates.html
- 2025-08-08T12:12:39.883Z
+ 2025-08-08T12:18:40.610Z
https://docs.axolotl.ai/docs/api/core.chat.format.shared.html
- 2025-08-08T12:12:39.098Z
+ 2025-08-08T12:18:39.819Z
https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html
- 2025-08-08T12:12:39.428Z
+ 2025-08-08T12:18:40.149Z
https://docs.axolotl.ai/docs/api/utils.collators.mamba.html
- 2025-08-08T12:12:40.308Z
+ 2025-08-08T12:18:41.040Z
https://docs.axolotl.ai/docs/api/logging_config.html
- 2025-08-08T12:12:39.042Z
+ 2025-08-08T12:18:39.763Z
https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html
- 2025-08-08T12:12:40.313Z
+ 2025-08-08T12:18:41.045Z
https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html
- 2025-08-08T12:12:39.551Z
+ 2025-08-08T12:18:40.274Z
https://docs.axolotl.ai/docs/api/kernels.utils.html
- 2025-08-08T12:12:39.770Z
+ 2025-08-08T12:18:40.496Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html
- 2025-08-08T12:12:39.584Z
+ 2025-08-08T12:18:40.308Z
https://docs.axolotl.ai/docs/api/kernels.swiglu.html
- 2025-08-08T12:12:39.761Z
+ 2025-08-08T12:18:40.487Z
https://docs.axolotl.ai/docs/api/common.const.html
- 2025-08-08T12:12:40.268Z
+ 2025-08-08T12:18:40.999Z
https://docs.axolotl.ai/docs/api/cli.cloud.base.html
- 2025-08-08T12:12:39.275Z
+ 2025-08-08T12:18:39.995Z
https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html
- 2025-08-08T12:12:39.647Z
+ 2025-08-08T12:18:40.372Z
https://docs.axolotl.ai/docs/api/core.builders.rl.html
- 2025-08-08T12:12:39.058Z
+ 2025-08-08T12:18:39.778Z
https://docs.axolotl.ai/docs/api/utils.dict.html
- 2025-08-08T12:12:39.975Z
+ 2025-08-08T12:18:40.703Z
https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html
- 2025-08-08T12:12:40.090Z
+ 2025-08-08T12:18:40.818Z
https://docs.axolotl.ai/docs/api/core.trainers.utils.html
- 2025-08-08T12:12:39.385Z
+ 2025-08-08T12:18:40.106Z
https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html
- 2025-08-08T12:12:39.834Z
+ 2025-08-08T12:18:40.561Z
https://docs.axolotl.ai/docs/api/cli.evaluate.html
- 2025-08-08T12:12:39.163Z
+ 2025-08-08T12:18:39.883Z
https://docs.axolotl.ai/docs/api/core.builders.causal.html
- 2025-08-08T12:12:39.053Z
+ 2025-08-08T12:18:39.774Z
https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html
- 2025-08-08T12:12:39.780Z
+ 2025-08-08T12:18:40.506Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html
- 2025-08-08T12:12:39.825Z
+ 2025-08-08T12:18:40.552Z
https://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html
- 2025-08-08T12:12:39.216Z
+ 2025-08-08T12:18:39.936Z
https://docs.axolotl.ai/docs/api/utils.schemas.trl.html
- 2025-08-08T12:12:40.072Z
+ 2025-08-08T12:18:40.800Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html
- 2025-08-08T12:12:39.606Z
+ 2025-08-08T12:18:40.330Z
https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html
- 2025-08-08T12:12:40.255Z
+ 2025-08-08T12:18:40.986Z
https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html
- 2025-08-08T12:12:39.874Z
+ 2025-08-08T12:18:40.601Z
https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html
- 2025-08-08T12:12:39.983Z
+ 2025-08-08T12:18:40.711Z
https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html
- 2025-08-08T12:12:39.843Z
+ 2025-08-08T12:18:40.571Z
https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html
- 2025-08-08T12:12:39.282Z
+ 2025-08-08T12:18:40.002Z
https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html
- 2025-08-08T12:12:39.511Z
+ 2025-08-08T12:18:40.232Z
https://docs.axolotl.ai/docs/api/utils.freeze.html
- 2025-08-08T12:12:39.904Z
+ 2025-08-08T12:18:40.632Z
https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html
- 2025-08-08T12:12:39.651Z
+ 2025-08-08T12:18:40.376Z
https://docs.axolotl.ai/docs/api/integrations.base.html
- 2025-08-08T12:12:40.243Z
+ 2025-08-08T12:18:40.973Z
https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html
- 2025-08-08T12:12:39.842Z
+ 2025-08-08T12:18:40.569Z
https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html
- 2025-08-08T12:12:39.626Z
+ 2025-08-08T12:18:40.350Z
https://docs.axolotl.ai/docs/api/cli.main.html
- 2025-08-08T12:12:39.146Z
+ 2025-08-08T12:18:39.866Z
https://docs.axolotl.ai/docs/api/common.datasets.html
- 2025-08-08T12:12:40.283Z
+ 2025-08-08T12:18:41.015Z
https://docs.axolotl.ai/docs/api/train.html
- 2025-08-08T12:12:38.957Z
+ 2025-08-08T12:18:39.676Z
https://docs.axolotl.ai/docs/api/utils.trainer.html
- 2025-08-08T12:12:39.922Z
+ 2025-08-08T12:18:40.650Z
https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html
- 2025-08-08T12:12:39.545Z
+ 2025-08-08T12:18:40.267Z
https://docs.axolotl.ai/docs/api/index.html
- 2025-08-08T12:12:38.894Z
+ 2025-08-08T12:18:39.614Z
https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html
- 2025-08-08T12:12:39.497Z
+ 2025-08-08T12:18:40.218Z
https://docs.axolotl.ai/docs/api/core.training_args.html
- 2025-08-08T12:12:39.070Z
+ 2025-08-08T12:18:39.791Z
https://docs.axolotl.ai/docs/api/kernels.quantize.html
- 2025-08-08T12:12:39.768Z
+ 2025-08-08T12:18:40.495Z
https://docs.axolotl.ai/docs/api/convert.html
- 2025-08-08T12:12:38.992Z
+ 2025-08-08T12:18:39.711Z
https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html
- 2025-08-08T12:12:40.247Z
+ 2025-08-08T12:18:40.978Z
https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html
- 2025-08-08T12:12:39.561Z
+ 2025-08-08T12:18:40.284Z
https://docs.axolotl.ai/docs/api/utils.schemas.model.html
- 2025-08-08T12:12:40.034Z
+ 2025-08-08T12:18:40.763Z
https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html
- 2025-08-08T12:12:40.379Z
+ 2025-08-08T12:18:41.112Z
https://docs.axolotl.ai/docs/api/loaders.constants.html
- 2025-08-08T12:12:39.422Z
+ 2025-08-08T12:18:40.143Z
https://docs.axolotl.ai/docs/api/cli.utils.sweeps.html
- 2025-08-08T12:12:39.311Z
+ 2025-08-08T12:18:40.032Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html
- 2025-08-08T12:12:39.595Z
+ 2025-08-08T12:18:40.318Z
https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html
- 2025-08-08T12:12:39.111Z
+ 2025-08-08T12:18:39.832Z
https://docs.axolotl.ai/docs/api/cli.utils.fetch.html
- 2025-08-08T12:12:39.300Z
+ 2025-08-08T12:18:40.021Z
https://docs.axolotl.ai/docs/api/core.trainers.mamba.html
- 2025-08-08T12:12:39.353Z
+ 2025-08-08T12:18:40.074Z
https://docs.axolotl.ai/docs/api/utils.schemas.enums.html
- 2025-08-08T12:12:40.101Z
+ 2025-08-08T12:18:40.829Z
https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html
- 2025-08-08T12:12:40.364Z
+ 2025-08-08T12:18:41.097Z
https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html
- 2025-08-08T12:12:39.568Z
+ 2025-08-08T12:18:40.291Z
https://docs.axolotl.ai/docs/api/core.trainers.trl.html
- 2025-08-08T12:12:39.348Z
+ 2025-08-08T12:18:40.069Z
https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html
- 2025-08-08T12:12:39.572Z
+ 2025-08-08T12:18:40.295Z
https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html
- 2025-08-08T12:12:40.354Z
+ 2025-08-08T12:18:41.086Z
https://docs.axolotl.ai/docs/api/utils.schedulers.html
- 2025-08-08T12:12:39.950Z
+ 2025-08-08T12:18:40.678Z
https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html
- 2025-08-08T12:12:39.371Z
+ 2025-08-08T12:18:40.092Z
https://docs.axolotl.ai/docs/api/prompt_tokenizers.html
- 2025-08-08T12:12:39.033Z
+ 2025-08-08T12:18:39.753Z
https://docs.axolotl.ai/docs/config-reference.html
- 2025-08-08T12:12:54.720Z
+ 2025-08-08T12:18:55.501Z
https://docs.axolotl.ai/docs/multimodal.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/mixed_precision.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/unsloth.html
- 2025-08-08T12:09:22.041Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/ray-integration.html
- 2025-08-08T12:09:22.041Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/dataset-formats/template_free.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/dataset-formats/index.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/dataset-formats/pretraining.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/nd_parallelism.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/sequence_parallelism.html
- 2025-08-08T12:09:22.041Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/inference.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.304Z
https://docs.axolotl.ai/docs/fsdp_qlora.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.302Z
https://docs.axolotl.ai/docs/multi-node.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/lora_optims.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.304Z
https://docs.axolotl.ai/docs/getting-started.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.302Z
https://docs.axolotl.ai/docs/dataset_loading.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/lr_groups.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.304Z
https://docs.axolotl.ai/docs/input_output.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.304Z
https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html
- 2025-08-08T12:09:22.060Z
+ 2025-08-08T12:15:25.325Z
https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html
- 2025-08-08T12:09:22.061Z
+ 2025-08-08T12:15:25.325Z
https://docs.axolotl.ai/docs/mac.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/optimizers.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/gradient_checkpointing.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.302Z
https://docs.axolotl.ai/docs/qat.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/faq.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.302Z
https://docs.axolotl.ai/docs/dataset_preprocessing.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/nccl.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/cli.html
- 2025-08-08T12:09:22.036Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/torchao.html
- 2025-08-08T12:09:22.041Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/multi-gpu.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/rlhf.html
- 2025-08-08T12:09:22.041Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/dataset-formats/tokenized.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/dataset-formats/conversation.html
- 2025-08-08T12:09:22.036Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/reward_modelling.html
- 2025-08-08T12:09:22.041Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/docker.html
- 2025-08-08T12:09:22.037Z
+ 2025-08-08T12:15:25.302Z
https://docs.axolotl.ai/docs/installation.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.304Z
https://docs.axolotl.ai/docs/quantize.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/docs/custom_integrations.html
- 2025-08-08T12:09:22.036Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/batch_vs_grad.html
- 2025-08-08T12:09:22.036Z
+ 2025-08-08T12:15:25.301Z
https://docs.axolotl.ai/docs/api/cli.utils.train.html
- 2025-08-08T12:12:39.322Z
+ 2025-08-08T12:18:40.043Z
https://docs.axolotl.ai/docs/api/cli.art.html
- 2025-08-08T12:12:39.186Z
+ 2025-08-08T12:18:39.906Z
https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html
- 2025-08-08T12:12:39.383Z
+ 2025-08-08T12:18:40.105Z
https://docs.axolotl.ai/docs/api/loaders.model.html
- 2025-08-08T12:12:39.395Z
+ 2025-08-08T12:18:40.116Z
https://docs.axolotl.ai/docs/api/cli.preprocess.html
- 2025-08-08T12:12:39.260Z
+ 2025-08-08T12:18:39.980Z
https://docs.axolotl.ai/docs/api/cli.utils.html
- 2025-08-08T12:12:39.283Z
+ 2025-08-08T12:18:40.003Z
https://docs.axolotl.ai/docs/api/cli.inference.html
- 2025-08-08T12:12:39.231Z
+ 2025-08-08T12:18:39.950Z
https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html
- 2025-08-08T12:12:39.823Z
+ 2025-08-08T12:18:40.550Z
https://docs.axolotl.ai/docs/api/datasets.html
- 2025-08-08T12:12:38.978Z
+ 2025-08-08T12:18:39.698Z
https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html
- 2025-08-08T12:12:39.840Z
+ 2025-08-08T12:18:40.567Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html
- 2025-08-08T12:12:39.776Z
+ 2025-08-08T12:18:40.502Z
https://docs.axolotl.ai/docs/api/monkeypatch.relora.html
- 2025-08-08T12:12:39.784Z
+ 2025-08-08T12:18:40.510Z
https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html
- 2025-08-08T12:12:39.831Z
+ 2025-08-08T12:18:40.558Z
https://docs.axolotl.ai/docs/api/loaders.adapter.html
- 2025-08-08T12:12:39.411Z
+ 2025-08-08T12:18:40.132Z
https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html
- 2025-08-08T12:12:39.360Z
+ 2025-08-08T12:18:40.081Z
https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html
- 2025-08-08T12:12:40.246Z
+ 2025-08-08T12:18:40.977Z
https://docs.axolotl.ai/docs/api/monkeypatch.utils.html
- 2025-08-08T12:12:39.822Z
+ 2025-08-08T12:18:40.549Z
https://docs.axolotl.ai/docs/api/loaders.processor.html
- 2025-08-08T12:12:39.405Z
+ 2025-08-08T12:18:40.126Z
https://docs.axolotl.ai/docs/api/cli.config.html
- 2025-08-08T12:12:39.211Z
+ 2025-08-08T12:18:39.931Z
https://docs.axolotl.ai/docs/api/integrations.liger.args.html
- 2025-08-08T12:12:40.258Z
+ 2025-08-08T12:18:40.989Z
https://docs.axolotl.ai/docs/api/loaders.tokenizer.html
- 2025-08-08T12:12:39.404Z
+ 2025-08-08T12:18:40.124Z
https://docs.axolotl.ai/docs/api/utils.schemas.config.html
- 2025-08-08T12:12:40.027Z
+ 2025-08-08T12:18:40.756Z
https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html
- 2025-08-08T12:12:39.462Z
+ 2025-08-08T12:18:40.183Z
https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html
- 2025-08-08T12:12:39.438Z
+ 2025-08-08T12:18:40.159Z
https://docs.axolotl.ai/docs/api/core.trainers.base.html
- 2025-08-08T12:12:39.333Z
+ 2025-08-08T12:18:40.054Z
https://docs.axolotl.ai/docs/api/cli.utils.args.html
- 2025-08-08T12:12:39.295Z
+ 2025-08-08T12:18:40.015Z
https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html
- 2025-08-08T12:12:39.583Z
+ 2025-08-08T12:18:40.306Z
https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html
- 2025-08-08T12:12:39.814Z
+ 2025-08-08T12:18:40.541Z
https://docs.axolotl.ai/docs/api/kernels.lora.html
- 2025-08-08T12:12:39.740Z
+ 2025-08-08T12:18:40.466Z
https://docs.axolotl.ai/docs/api/cli.vllm_serve.html
- 2025-08-08T12:12:39.272Z
+ 2025-08-08T12:18:39.992Z
https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html
- 2025-08-08T12:12:40.077Z
+ 2025-08-08T12:18:40.806Z
https://docs.axolotl.ai/docs/api/utils.schemas.utils.html
- 2025-08-08T12:12:40.106Z
+ 2025-08-08T12:18:40.835Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html
- 2025-08-08T12:12:39.777Z
+ 2025-08-08T12:18:40.503Z
https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html
- 2025-08-08T12:12:40.261Z
+ 2025-08-08T12:18:40.992Z
https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html
- 2025-08-08T12:12:39.779Z
+ 2025-08-08T12:18:40.505Z
https://docs.axolotl.ai/docs/api/utils.collators.core.html
- 2025-08-08T12:12:40.286Z
+ 2025-08-08T12:18:41.017Z
https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html
- 2025-08-08T12:12:39.095Z
+ 2025-08-08T12:18:39.816Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html
- 2025-08-08T12:12:39.609Z
+ 2025-08-08T12:18:40.333Z
https://docs.axolotl.ai/docs/api/core.datasets.chat.html
- 2025-08-08T12:12:39.103Z
+ 2025-08-08T12:18:39.824Z
https://docs.axolotl.ai/docs/api/utils.bench.html
- 2025-08-08T12:12:39.897Z
+ 2025-08-08T12:18:40.624Z
https://docs.axolotl.ai/docs/api/utils.schemas.training.html
- 2025-08-08T12:12:40.041Z
+ 2025-08-08T12:18:40.770Z
https://docs.axolotl.ai/docs/api/utils.collators.batching.html
- 2025-08-08T12:12:40.304Z
+ 2025-08-08T12:18:41.036Z
https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html
- 2025-08-08T12:12:39.557Z
+ 2025-08-08T12:18:40.280Z
https://docs.axolotl.ai/docs/api/utils.lora.html
- 2025-08-08T12:12:39.888Z
+ 2025-08-08T12:18:40.615Z
https://docs.axolotl.ai/docs/api/prompt_strategies.base.html
- 2025-08-08T12:12:39.463Z
+ 2025-08-08T12:18:40.185Z
https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html
- 2025-08-08T12:12:39.525Z
+ 2025-08-08T12:18:40.246Z
https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html
- 2025-08-08T12:12:40.060Z
+ 2025-08-08T12:18:40.788Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html
- 2025-08-08T12:12:39.608Z
+ 2025-08-08T12:18:40.332Z
https://docs.axolotl.ai/docs/api/utils.schemas.peft.html
- 2025-08-08T12:12:40.068Z
+ 2025-08-08T12:18:40.797Z
https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html
- 2025-08-08T12:12:39.579Z
+ 2025-08-08T12:18:40.302Z
https://docs.axolotl.ai/docs/api/common.architectures.html
- 2025-08-08T12:12:40.266Z
+ 2025-08-08T12:18:40.998Z
https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html
- 2025-08-08T12:12:39.848Z
+ 2025-08-08T12:18:40.575Z
https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html
- 2025-08-08T12:12:40.373Z
+ 2025-08-08T12:18:41.105Z
https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html
- 2025-08-08T12:12:40.265Z
+ 2025-08-08T12:18:40.996Z
https://docs.axolotl.ai/docs/api/cli.quantize.html
- 2025-08-08T12:12:39.265Z
+ 2025-08-08T12:18:39.985Z
https://docs.axolotl.ai/docs/api/cli.checks.html
- 2025-08-08T12:12:39.193Z
+ 2025-08-08T12:18:39.913Z
https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html
- 2025-08-08T12:12:39.617Z
+ 2025-08-08T12:18:40.342Z
https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html
- 2025-08-08T12:12:39.893Z
+ 2025-08-08T12:18:40.621Z
https://docs.axolotl.ai/docs/api/utils.quantization.html
- 2025-08-08T12:12:40.013Z
+ 2025-08-08T12:18:40.741Z
https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html
- 2025-08-08T12:12:39.431Z
+ 2025-08-08T12:18:40.152Z
https://docs.axolotl.ai/docs/api/kernels.geglu.html
- 2025-08-08T12:12:39.751Z
+ 2025-08-08T12:18:40.477Z
https://docs.axolotl.ai/docs/api/utils.data.pretraining.html
- 2025-08-08T12:12:39.985Z
+ 2025-08-08T12:18:40.713Z
https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html
- 2025-08-08T12:12:39.627Z
+ 2025-08-08T12:18:40.351Z
https://docs.axolotl.ai/docs/api/core.builders.base.html
- 2025-08-08T12:12:39.049Z
+ 2025-08-08T12:18:39.769Z
https://docs.axolotl.ai/docs/api/cli.merge_lora.html
- 2025-08-08T12:12:39.239Z
+ 2025-08-08T12:18:39.959Z
https://docs.axolotl.ai/docs/api/cli.utils.load.html
- 2025-08-08T12:12:39.305Z
+ 2025-08-08T12:18:40.026Z
https://docs.axolotl.ai/docs/api/utils.data.sft.html
- 2025-08-08T12:12:39.992Z
+ 2025-08-08T12:18:40.720Z
https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html
- 2025-08-08T12:12:39.533Z
+ 2025-08-08T12:18:40.254Z
https://docs.axolotl.ai/docs/api/utils.tokenization.html
- 2025-08-08T12:12:39.881Z
+ 2025-08-08T12:18:40.608Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html
- 2025-08-08T12:12:39.605Z
+ 2025-08-08T12:18:40.329Z
https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html
- 2025-08-08T12:12:40.284Z
+ 2025-08-08T12:18:41.016Z
https://docs.axolotl.ai/docs/api/cli.args.html
- 2025-08-08T12:12:39.183Z
+ 2025-08-08T12:18:39.903Z
https://docs.axolotl.ai/docs/api/evaluate.html
- 2025-08-08T12:12:38.967Z
+ 2025-08-08T12:18:39.687Z
https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html
- 2025-08-08T12:12:39.513Z
+ 2025-08-08T12:18:40.234Z
https://docs.axolotl.ai/docs/api/utils.distributed.html
- 2025-08-08T12:12:39.970Z
+ 2025-08-08T12:18:40.698Z
https://docs.axolotl.ai/docs/multipack.html
- 2025-08-08T12:09:22.040Z
+ 2025-08-08T12:15:25.305Z
https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html
- 2025-08-08T12:09:22.045Z
+ 2025-08-08T12:15:25.309Z
https://docs.axolotl.ai/FAQS.html
- 2025-08-08T12:09:22.035Z
+ 2025-08-08T12:15:25.299Z