diff --git a/.nojekyll b/.nojekyll index bdfec518d..50b586081 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -f1035500 \ No newline at end of file +999b5f89 \ No newline at end of file diff --git a/docs/api/index.html b/docs/api/index.html index 38995ee00..6f5c6b098 100644 --- a/docs/api/index.html +++ b/docs/api/index.html @@ -965,7 +965,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); utils.distributed -utility helpers for distributed checks +Utilities for distributed functionality. utils.dict diff --git a/docs/api/utils.distributed.html b/docs/api/utils.distributed.html index 1792b4a3c..738690cf2 100644 --- a/docs/api/utils.distributed.html +++ b/docs/api/utils.distributed.html @@ -494,7 +494,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

utils.distributed

utils.distributed

-

utility helpers for distributed checks

+

Utilities for distributed functionality.

Functions

@@ -595,13 +595,35 @@ The value is then broadcasted to all other ranks.

is_main_process

-
utils.distributed.is_main_process(use_environ=False)
+
utils.distributed.is_main_process()

Check if the current process is the main process. If not in distributed mode, always return True.

-

Args: -- use_environ (bool, optional): Use environment variable to determine main process.

-

Returns: -- bool: True if the current process is the main process, False otherwise.

+

We use a simpler logic when the distributed state is not initialized: we just log +on the 0-th local rank.

+
+

Returns

+
+++++ + + + + + + + + + + + + + + +
NameTypeDescription
boolTrue if the current process is the main process, False otherwise.
+

reduce_and_broadcast

diff --git a/search.json b/search.json index 1ca60bc9d..39c2ded53 100644 --- a/search.json +++ b/search.json @@ -1181,14 +1181,14 @@ "href": "docs/api/utils.distributed.html", "title": "utils.distributed", "section": "", - "text": "utils.distributed\nutility helpers for distributed checks\n\n\n\n\n\nName\nDescription\n\n\n\n\nbarrier\nActs as a barrier to wait for all processes. This ensures that all processes\n\n\ncleanup_distributed\nDestroy process group if torch distributed is initialized. Called in training early\n\n\ncompute_and_broadcast\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\n\n\ngather_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\ngather_scalar_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\nis_distributed\nCheck if distributed training is initialized.\n\n\nis_main_process\nCheck if the current process is the main process. If not in distributed mode,\n\n\nreduce_and_broadcast\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\n\n\nzero_first\nruns the wrapped context so that rank 0 runs first before other ranks\n\n\n\n\n\nutils.distributed.barrier()\nActs as a barrier to wait for all processes. This ensures that all processes\nreach the barrier before proceeding further.\n\n\n\nutils.distributed.cleanup_distributed()\nDestroy process group if torch distributed is initialized. Called in training early\ntermination or when training successfully completes.\n\n\n\nutils.distributed.compute_and_broadcast(fn)\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\nThe value is then broadcasted to all other ranks.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that computes the value. Default is 0.\nReturns:\n- The computed value (int or float).\n\n\n\nutils.distributed.gather_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.gather_scalar_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.is_distributed()\nCheck if distributed training is initialized.\n\n\n\nutils.distributed.is_main_process(use_environ=False)\nCheck if the current process is the main process. If not in distributed mode,\nalways return True.\nArgs:\n- use_environ (bool, optional): Use environment variable to determine main process.\nReturns:\n- bool: True if the current process is the main process, False otherwise.\n\n\n\nutils.distributed.reduce_and_broadcast(fn1, fn2)\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\nand then broadcast the reduced result to all ranks.\nArgs:\n- fn1 (callable): A function that computes the value on each rank.\n- fn2 (callable): A reduction function that takes a list of values and returns a single value.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- The reduced and broadcasted value.\n\n\n\nutils.distributed.zero_first(is_main)\nruns the wrapped context so that rank 0 runs first before other ranks" + "text": "utils.distributed\nUtilities for distributed functionality.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbarrier\nActs as a barrier to wait for all processes. This ensures that all processes\n\n\ncleanup_distributed\nDestroy process group if torch distributed is initialized. Called in training early\n\n\ncompute_and_broadcast\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\n\n\ngather_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\ngather_scalar_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\nis_distributed\nCheck if distributed training is initialized.\n\n\nis_main_process\nCheck if the current process is the main process. If not in distributed mode,\n\n\nreduce_and_broadcast\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\n\n\nzero_first\nruns the wrapped context so that rank 0 runs first before other ranks\n\n\n\n\n\nutils.distributed.barrier()\nActs as a barrier to wait for all processes. This ensures that all processes\nreach the barrier before proceeding further.\n\n\n\nutils.distributed.cleanup_distributed()\nDestroy process group if torch distributed is initialized. Called in training early\ntermination or when training successfully completes.\n\n\n\nutils.distributed.compute_and_broadcast(fn)\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\nThe value is then broadcasted to all other ranks.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that computes the value. Default is 0.\nReturns:\n- The computed value (int or float).\n\n\n\nutils.distributed.gather_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.gather_scalar_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.is_distributed()\nCheck if distributed training is initialized.\n\n\n\nutils.distributed.is_main_process()\nCheck if the current process is the main process. If not in distributed mode,\nalways return True.\nWe use a simpler logic when the distributed state is not initialized: we just log\non the 0-th local rank.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue if the current process is the main process, False otherwise.\n\n\n\n\n\n\n\nutils.distributed.reduce_and_broadcast(fn1, fn2)\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\nand then broadcast the reduced result to all ranks.\nArgs:\n- fn1 (callable): A function that computes the value on each rank.\n- fn2 (callable): A reduction function that takes a list of values and returns a single value.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- The reduced and broadcasted value.\n\n\n\nutils.distributed.zero_first(is_main)\nruns the wrapped context so that rank 0 runs first before other ranks" }, { "objectID": "docs/api/utils.distributed.html#functions", "href": "docs/api/utils.distributed.html#functions", "title": "utils.distributed", "section": "", - "text": "Name\nDescription\n\n\n\n\nbarrier\nActs as a barrier to wait for all processes. This ensures that all processes\n\n\ncleanup_distributed\nDestroy process group if torch distributed is initialized. Called in training early\n\n\ncompute_and_broadcast\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\n\n\ngather_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\ngather_scalar_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\nis_distributed\nCheck if distributed training is initialized.\n\n\nis_main_process\nCheck if the current process is the main process. If not in distributed mode,\n\n\nreduce_and_broadcast\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\n\n\nzero_first\nruns the wrapped context so that rank 0 runs first before other ranks\n\n\n\n\n\nutils.distributed.barrier()\nActs as a barrier to wait for all processes. This ensures that all processes\nreach the barrier before proceeding further.\n\n\n\nutils.distributed.cleanup_distributed()\nDestroy process group if torch distributed is initialized. Called in training early\ntermination or when training successfully completes.\n\n\n\nutils.distributed.compute_and_broadcast(fn)\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\nThe value is then broadcasted to all other ranks.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that computes the value. Default is 0.\nReturns:\n- The computed value (int or float).\n\n\n\nutils.distributed.gather_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.gather_scalar_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.is_distributed()\nCheck if distributed training is initialized.\n\n\n\nutils.distributed.is_main_process(use_environ=False)\nCheck if the current process is the main process. If not in distributed mode,\nalways return True.\nArgs:\n- use_environ (bool, optional): Use environment variable to determine main process.\nReturns:\n- bool: True if the current process is the main process, False otherwise.\n\n\n\nutils.distributed.reduce_and_broadcast(fn1, fn2)\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\nand then broadcast the reduced result to all ranks.\nArgs:\n- fn1 (callable): A function that computes the value on each rank.\n- fn2 (callable): A reduction function that takes a list of values and returns a single value.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- The reduced and broadcasted value.\n\n\n\nutils.distributed.zero_first(is_main)\nruns the wrapped context so that rank 0 runs first before other ranks" + "text": "Name\nDescription\n\n\n\n\nbarrier\nActs as a barrier to wait for all processes. This ensures that all processes\n\n\ncleanup_distributed\nDestroy process group if torch distributed is initialized. Called in training early\n\n\ncompute_and_broadcast\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\n\n\ngather_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\ngather_scalar_from_all_ranks\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\n\n\nis_distributed\nCheck if distributed training is initialized.\n\n\nis_main_process\nCheck if the current process is the main process. If not in distributed mode,\n\n\nreduce_and_broadcast\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\n\n\nzero_first\nruns the wrapped context so that rank 0 runs first before other ranks\n\n\n\n\n\nutils.distributed.barrier()\nActs as a barrier to wait for all processes. This ensures that all processes\nreach the barrier before proceeding further.\n\n\n\nutils.distributed.cleanup_distributed()\nDestroy process group if torch distributed is initialized. Called in training early\ntermination or when training successfully completes.\n\n\n\nutils.distributed.compute_and_broadcast(fn)\nCompute a value using the function ‘fn’ only on the specified rank (default is 0).\nThe value is then broadcasted to all other ranks.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that computes the value. Default is 0.\nReturns:\n- The computed value (int or float).\n\n\n\nutils.distributed.gather_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.gather_scalar_from_all_ranks(fn, world_size=1)\nRun a callable ‘fn’ on all ranks and gather the results on the specified rank.\nArgs:\n- fn (callable): A function that computes the value. This should not have any side effects.\n- rank (int, optional): The rank that gathers the values. Default is 0.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- A list of computed values from all ranks if on the gathering rank, otherwise None.\n\n\n\nutils.distributed.is_distributed()\nCheck if distributed training is initialized.\n\n\n\nutils.distributed.is_main_process()\nCheck if the current process is the main process. If not in distributed mode,\nalways return True.\nWe use a simpler logic when the distributed state is not initialized: we just log\non the 0-th local rank.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue if the current process is the main process, False otherwise.\n\n\n\n\n\n\n\nutils.distributed.reduce_and_broadcast(fn1, fn2)\nRun a callable ‘fn1’ on all ranks, gather the results, reduce them using ‘fn2’,\nand then broadcast the reduced result to all ranks.\nArgs:\n- fn1 (callable): A function that computes the value on each rank.\n- fn2 (callable): A reduction function that takes a list of values and returns a single value.\n- world_size (int, optional): Total number of processes in the current distributed setup.\nReturns:\n- The reduced and broadcasted value.\n\n\n\nutils.distributed.zero_first(is_main)\nruns the wrapped context so that rank 0 runs first before other ranks" }, { "objectID": "docs/api/monkeypatch.gradient_checkpointing.offload_disk.html", @@ -2560,7 +2560,7 @@ "href": "docs/api/index.html", "title": "API Reference", "section": "", - "text": "Core functionality for training\n\n\n\ntrain\nPrepare and train a model on a dataset. Can also infer from a model or merge lora\n\n\nevaluate\nModule for evaluating models.\n\n\ndatasets\nModule containing Dataset functionality\n\n\nconvert\nModule containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes\n\n\nprompt_tokenizers\nModule containing PromptTokenizingStrategy and Prompter classes\n\n\nlogging_config\nCommon logging module for axolotl\n\n\ncore.builders.base\nBase class for trainer builder\n\n\ncore.builders.causal\nBuilder for causal trainers\n\n\ncore.builders.rl\nBuilder for RLHF trainers\n\n\ncore.training_args\nextra axolotl specific training args\n\n\ncore.chat.messages\ninternal message representations of chat messages\n\n\ncore.chat.format.chatml\nChatML transformation functions for MessageContents\n\n\ncore.chat.format.llama3x\nLlama 3.x chat formatting functions for MessageContents\n\n\ncore.chat.format.shared\nshared functions for format transforms\n\n\ncore.datasets.chat\nchat dataset module\n\n\ncore.datasets.transforms.chat_builder\nThis module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.\n\n\n\n\n\n\nCommand-line interface\n\n\n\ncli.main\nClick CLI definitions for various axolotl commands.\n\n\ncli.train\nCLI to run training on a model.\n\n\ncli.evaluate\nCLI to run evaluation on a model.\n\n\ncli.args\nModule for axolotl CLI command arguments.\n\n\ncli.checks\nVarious checks for Axolotl CLI.\n\n\ncli.config\nConfiguration loading and processing.\n\n\ncli.inference\nCLI to run inference on a trained model.\n\n\ncli.merge_lora\nCLI to merge a trained LoRA into a base model.\n\n\ncli.merge_sharded_fsdp_weights\nCLI to merge sharded FSDP model checkpoints into a single combined checkpoint.\n\n\ncli.preprocess\nCLI to run preprocessing of a dataset.\n\n\ncli.sweeps\nUtilities for handling sweeps over configs for axolotl train CLI command\n\n\ncli.utils\nUtility methods for axolotl CLI.\n\n\ncli.vllm_serve\nCLI to start the vllm server for online RL\n\n\ncli.cloud.base\nbase class for cloud platforms from cli\n\n\ncli.cloud.modal_\nModal Cloud support from CLI\n\n\ncli.quantize\nCLI to post-training quantize a model using torchao\n\n\n\n\n\n\nTraining implementations\n\n\n\ncore.trainers.base\nModule for customized trainers\n\n\ncore.trainers.trl\nModule for TRL PPO trainer\n\n\ncore.trainers.mamba\nModule for mamba trainer\n\n\ncore.trainers.relora\nModule for ReLoRA trainer\n\n\ncore.trainers.dpo.trainer\nDPO trainer for axolotl\n\n\ncore.trainers.grpo.trainer\nAxolotl GRPO trainers (with and without sequence parallelism handling)\n\n\ncore.trainers.grpo.sampler\nRepeat random sampler (similar to the one implemented in\n\n\ncore.trainers.utils\nUtils for Axolotl trainers\n\n\n\n\n\n\nFunctionality for loading and patching models, tokenizers, etc.\n\n\n\nloaders.model\nModel loader class implementation for loading, configuring, and patching various\n\n\nloaders.tokenizer\nTokenizer loading functionality and associated utils\n\n\nloaders.processor\nProcessor loading functionality for multi-modal models\n\n\nloaders.adapter\nAdapter loading functionality, including LoRA / QLoRA and associated utils\n\n\nloaders.patch_manager\nPatch manager class implementation to complement axolotl.loaders.ModelLoader.\n\n\nloaders.constants\nShared constants for axolotl.loaders module\n\n\n\n\n\n\nMixin classes for augmenting trainers\n\n\n\ncore.trainers.mixins.optimizer\nModule for Axolotl trainer optimizer mixin\n\n\ncore.trainers.mixins.rng_state_loader\nTemporary fix/override for bug in resume from checkpoint\n\n\ncore.trainers.mixins.scheduler\nModule for Axolotl trainer scheduler mixin\n\n\n\n\n\n\nContext managers for altering trainer behaviors\n\n\n\nutils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\n\nPrompt formatting strategies\n\n\n\nprompt_strategies.base\nmodule for base dataset transform strategies\n\n\nprompt_strategies.chat_template\nHF Chat Templates prompt strategy\n\n\nprompt_strategies.alpaca_chat\nModule for Alpaca prompt strategy classes\n\n\nprompt_strategies.alpaca_instruct\nModule loading the AlpacaInstructPromptTokenizingStrategy class\n\n\nprompt_strategies.alpaca_w_system\nPrompt strategies loader for alpaca instruction datasets with system prompts\n\n\nprompt_strategies.user_defined\nUser Defined prompts with configuration from the YML config\n\n\nprompt_strategies.llama2_chat\nPrompt Strategy for finetuning Llama2 chat models\n\n\nprompt_strategies.completion\nBasic completion text\n\n\nprompt_strategies.input_output\nModule for plain input/output prompt pairs\n\n\nprompt_strategies.stepwise_supervised\nModule for stepwise datasets, typically including a prompt and reasoning traces,\n\n\nprompt_strategies.metharme\nModule containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class\n\n\nprompt_strategies.orcamini\nPrompt Strategy for finetuning Orca Mini (v2) models\n\n\nprompt_strategies.pygmalion\nModule containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class\n\n\nprompt_strategies.messages.chat\nChat dataset wrapping strategy for new internal messages representations\n\n\nprompt_strategies.dpo.chat_template\nDPO prompt strategies for using tokenizer chat templates.\n\n\nprompt_strategies.dpo.llama3\nDPO strategies for llama-3 chat template\n\n\nprompt_strategies.dpo.chatml\nDPO strategies for chatml\n\n\nprompt_strategies.dpo.zephyr\nDPO strategies for zephyr\n\n\nprompt_strategies.dpo.user_defined\nUser-defined DPO strategies\n\n\nprompt_strategies.dpo.passthrough\nDPO prompt strategies passthrough/zero-processing strategy\n\n\nprompt_strategies.kto.llama3\nKTO strategies for llama-3 chat template\n\n\nprompt_strategies.kto.chatml\nKTO strategies for chatml\n\n\nprompt_strategies.kto.user_defined\nUser-defined KTO strategies\n\n\nprompt_strategies.orpo.chat_template\nchatml prompt tokenization strategy for ORPO\n\n\nprompt_strategies.bradley_terry.llama3\nchatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template\n\n\n\n\n\n\nLow-level performance optimizations\n\n\n\nkernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\n\n\nkernels.geglu\nModule for definition of GEGLU Triton kernels.\n\n\nkernels.swiglu\nModule for definition of SwiGLU Triton kernels.\n\n\nkernels.quantize\nDequantization utilities for bitsandbytes integration.\n\n\nkernels.utils\nUtilities for axolotl.kernels submodules.\n\n\n\n\n\n\nRuntime patches for model optimizations\n\n\n\nmonkeypatch.llama_attn_hijack_flash\nFlash attention monkey patch for llama model\n\n\nmonkeypatch.llama_attn_hijack_xformers\nDirectly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments\n\n\nmonkeypatch.mistral_attn_hijack_flash\nFlash attention monkey patch for mistral model\n\n\nmonkeypatch.multipack\nmultipack patching for v2 of sample packing\n\n\nmonkeypatch.relora\nImplements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.\n\n\nmonkeypatch.llama_expand_mask\nexpands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf\n\n\nmonkeypatch.lora_kernels\nModule for patching custom LoRA Triton kernels and torch.autograd functions.\n\n\nmonkeypatch.utils\nShared utils for the monkeypatches\n\n\nmonkeypatch.btlm_attn_hijack_flash\nFlash attention monkey patch for cerebras btlm model\n\n\nmonkeypatch.llama_patch_multipack\nPatched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention\n\n\nmonkeypatch.stablelm_attn_hijack_flash\nPyTorch StableLM Epoch model.\n\n\nmonkeypatch.trainer_fsdp_optim\nfix for FSDP optimizer save in trainer w 4.47.0\n\n\nmonkeypatch.transformers_fa_utils\nsee https://github.com/huggingface/transformers/pull/35834\n\n\nmonkeypatch.unsloth_\nmodule for patching with unsloth optimizations\n\n\nmonkeypatch.data.batch_dataset_fetcher\nmonkey patches for the dataset fetcher to handle batches of packed indexes\n\n\nmonkeypatch.mixtral\nPatches to support multipack for mixtral\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu\nCPU offloaded checkpointing\n\n\nmonkeypatch.gradient_checkpointing.offload_disk\nDISCO - DIsk-based Storage and Checkpointing with Optimized prefetching\n\n\n\n\n\n\nUtility functions\n\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nutility helpers for distributed checks\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\nData handling specific to SFT.\n\n\nutils.quantization\nUtilities for quantization including QAT and PTQ using torchao.\n\n\n\n\n\n\nPydantic data models for Axolotl config\n\n\n\nutils.schemas.config\nModule with Pydantic models for configuration.\n\n\nutils.schemas.model\nPydantic models for model input / output, etc. configuration\n\n\nutils.schemas.training\nPydantic models for training hyperparameters\n\n\nutils.schemas.datasets\nPydantic models for datasets-related configuration\n\n\nutils.schemas.peft\nPydantic models for PEFT-related configuration\n\n\nutils.schemas.trl\nPydantic models for TRL trainer configuration\n\n\nutils.schemas.multimodal\nPydantic models for multimodal-related configuration\n\n\nutils.schemas.integrations\nPydantic models for Axolotl integrations\n\n\nutils.schemas.enums\nEnums for Axolotl input config\n\n\nutils.schemas.utils\nUtilities for Axolotl Pydantic models\n\n\n\n\n\n\nThird-party integrations and extensions\n\n\n\nintegrations.base\nBase class for all plugins.\n\n\nintegrations.cut_cross_entropy.args\nModule for handling Cut Cross Entropy input arguments.\n\n\nintegrations.grokfast.optimizer\n\n\n\nintegrations.kd.trainer\nKD trainer\n\n\nintegrations.liger.args\nModule for handling LIGER input arguments.\n\n\nintegrations.lm_eval.args\nModule for handling lm eval harness input arguments.\n\n\nintegrations.spectrum.args\nModule for handling Spectrum input arguments.\n\n\n\n\n\n\nCommon utilities and shared functionality\n\n\n\ncommon.architectures\nCommon architecture specific constants\n\n\ncommon.const\nVarious shared constants\n\n\ncommon.datasets\nDataset loading utilities.\n\n\n\n\n\n\nCustom model implementations\n\n\n\nmodels.mamba.modeling_mamba\n\n\n\n\n\n\n\nData processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler - An efficient batch sampler for packing variable-length sequences\n\n\n\n\n\n\nTraining callbacks\n\n\n\nutils.callbacks.perplexity\ncallback to calculate perplexity as an evaluation metric.\n\n\nutils.callbacks.profiler\nHF Trainer callback for creating pytorch profiling snapshots\n\n\nutils.callbacks.lisa\nmodule for LISA\n\n\nutils.callbacks.mlflow_\nMLFlow module for trainer callbacks\n\n\nutils.callbacks.comet_\nComet module for trainer callbacks\n\n\nutils.callbacks.qat\nQAT Callback for HF Causal Trainer" + "text": "Core functionality for training\n\n\n\ntrain\nPrepare and train a model on a dataset. Can also infer from a model or merge lora\n\n\nevaluate\nModule for evaluating models.\n\n\ndatasets\nModule containing Dataset functionality\n\n\nconvert\nModule containing File Reader, File Writer, Json Parser, and Jsonl Serializer classes\n\n\nprompt_tokenizers\nModule containing PromptTokenizingStrategy and Prompter classes\n\n\nlogging_config\nCommon logging module for axolotl\n\n\ncore.builders.base\nBase class for trainer builder\n\n\ncore.builders.causal\nBuilder for causal trainers\n\n\ncore.builders.rl\nBuilder for RLHF trainers\n\n\ncore.training_args\nextra axolotl specific training args\n\n\ncore.chat.messages\ninternal message representations of chat messages\n\n\ncore.chat.format.chatml\nChatML transformation functions for MessageContents\n\n\ncore.chat.format.llama3x\nLlama 3.x chat formatting functions for MessageContents\n\n\ncore.chat.format.shared\nshared functions for format transforms\n\n\ncore.datasets.chat\nchat dataset module\n\n\ncore.datasets.transforms.chat_builder\nThis module contains a function that builds a transform that takes a row from the dataset and converts it to a Chat.\n\n\n\n\n\n\nCommand-line interface\n\n\n\ncli.main\nClick CLI definitions for various axolotl commands.\n\n\ncli.train\nCLI to run training on a model.\n\n\ncli.evaluate\nCLI to run evaluation on a model.\n\n\ncli.args\nModule for axolotl CLI command arguments.\n\n\ncli.checks\nVarious checks for Axolotl CLI.\n\n\ncli.config\nConfiguration loading and processing.\n\n\ncli.inference\nCLI to run inference on a trained model.\n\n\ncli.merge_lora\nCLI to merge a trained LoRA into a base model.\n\n\ncli.merge_sharded_fsdp_weights\nCLI to merge sharded FSDP model checkpoints into a single combined checkpoint.\n\n\ncli.preprocess\nCLI to run preprocessing of a dataset.\n\n\ncli.sweeps\nUtilities for handling sweeps over configs for axolotl train CLI command\n\n\ncli.utils\nUtility methods for axolotl CLI.\n\n\ncli.vllm_serve\nCLI to start the vllm server for online RL\n\n\ncli.cloud.base\nbase class for cloud platforms from cli\n\n\ncli.cloud.modal_\nModal Cloud support from CLI\n\n\ncli.quantize\nCLI to post-training quantize a model using torchao\n\n\n\n\n\n\nTraining implementations\n\n\n\ncore.trainers.base\nModule for customized trainers\n\n\ncore.trainers.trl\nModule for TRL PPO trainer\n\n\ncore.trainers.mamba\nModule for mamba trainer\n\n\ncore.trainers.relora\nModule for ReLoRA trainer\n\n\ncore.trainers.dpo.trainer\nDPO trainer for axolotl\n\n\ncore.trainers.grpo.trainer\nAxolotl GRPO trainers (with and without sequence parallelism handling)\n\n\ncore.trainers.grpo.sampler\nRepeat random sampler (similar to the one implemented in\n\n\ncore.trainers.utils\nUtils for Axolotl trainers\n\n\n\n\n\n\nFunctionality for loading and patching models, tokenizers, etc.\n\n\n\nloaders.model\nModel loader class implementation for loading, configuring, and patching various\n\n\nloaders.tokenizer\nTokenizer loading functionality and associated utils\n\n\nloaders.processor\nProcessor loading functionality for multi-modal models\n\n\nloaders.adapter\nAdapter loading functionality, including LoRA / QLoRA and associated utils\n\n\nloaders.patch_manager\nPatch manager class implementation to complement axolotl.loaders.ModelLoader.\n\n\nloaders.constants\nShared constants for axolotl.loaders module\n\n\n\n\n\n\nMixin classes for augmenting trainers\n\n\n\ncore.trainers.mixins.optimizer\nModule for Axolotl trainer optimizer mixin\n\n\ncore.trainers.mixins.rng_state_loader\nTemporary fix/override for bug in resume from checkpoint\n\n\ncore.trainers.mixins.scheduler\nModule for Axolotl trainer scheduler mixin\n\n\n\n\n\n\nContext managers for altering trainer behaviors\n\n\n\nutils.ctx_managers.sequence_parallel\nModule for Axolotl trainer sequence parallelism manager and utilities\n\n\n\n\n\n\nPrompt formatting strategies\n\n\n\nprompt_strategies.base\nmodule for base dataset transform strategies\n\n\nprompt_strategies.chat_template\nHF Chat Templates prompt strategy\n\n\nprompt_strategies.alpaca_chat\nModule for Alpaca prompt strategy classes\n\n\nprompt_strategies.alpaca_instruct\nModule loading the AlpacaInstructPromptTokenizingStrategy class\n\n\nprompt_strategies.alpaca_w_system\nPrompt strategies loader for alpaca instruction datasets with system prompts\n\n\nprompt_strategies.user_defined\nUser Defined prompts with configuration from the YML config\n\n\nprompt_strategies.llama2_chat\nPrompt Strategy for finetuning Llama2 chat models\n\n\nprompt_strategies.completion\nBasic completion text\n\n\nprompt_strategies.input_output\nModule for plain input/output prompt pairs\n\n\nprompt_strategies.stepwise_supervised\nModule for stepwise datasets, typically including a prompt and reasoning traces,\n\n\nprompt_strategies.metharme\nModule containing the MetharmenPromptTokenizingStrategy and MetharmePrompter class\n\n\nprompt_strategies.orcamini\nPrompt Strategy for finetuning Orca Mini (v2) models\n\n\nprompt_strategies.pygmalion\nModule containing the PygmalionPromptTokenizingStrategy and PygmalionPrompter class\n\n\nprompt_strategies.messages.chat\nChat dataset wrapping strategy for new internal messages representations\n\n\nprompt_strategies.dpo.chat_template\nDPO prompt strategies for using tokenizer chat templates.\n\n\nprompt_strategies.dpo.llama3\nDPO strategies for llama-3 chat template\n\n\nprompt_strategies.dpo.chatml\nDPO strategies for chatml\n\n\nprompt_strategies.dpo.zephyr\nDPO strategies for zephyr\n\n\nprompt_strategies.dpo.user_defined\nUser-defined DPO strategies\n\n\nprompt_strategies.dpo.passthrough\nDPO prompt strategies passthrough/zero-processing strategy\n\n\nprompt_strategies.kto.llama3\nKTO strategies for llama-3 chat template\n\n\nprompt_strategies.kto.chatml\nKTO strategies for chatml\n\n\nprompt_strategies.kto.user_defined\nUser-defined KTO strategies\n\n\nprompt_strategies.orpo.chat_template\nchatml prompt tokenization strategy for ORPO\n\n\nprompt_strategies.bradley_terry.llama3\nchatml transforms for datasets with system, input, chosen, rejected to match llama3 chat template\n\n\n\n\n\n\nLow-level performance optimizations\n\n\n\nkernels.lora\nModule for definition of Low-Rank Adaptation (LoRA) Triton kernels.\n\n\nkernels.geglu\nModule for definition of GEGLU Triton kernels.\n\n\nkernels.swiglu\nModule for definition of SwiGLU Triton kernels.\n\n\nkernels.quantize\nDequantization utilities for bitsandbytes integration.\n\n\nkernels.utils\nUtilities for axolotl.kernels submodules.\n\n\n\n\n\n\nRuntime patches for model optimizations\n\n\n\nmonkeypatch.llama_attn_hijack_flash\nFlash attention monkey patch for llama model\n\n\nmonkeypatch.llama_attn_hijack_xformers\nDirectly copied the code from https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/modules/llama_attn_hijack.py and made some adjustments\n\n\nmonkeypatch.mistral_attn_hijack_flash\nFlash attention monkey patch for mistral model\n\n\nmonkeypatch.multipack\nmultipack patching for v2 of sample packing\n\n\nmonkeypatch.relora\nImplements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695, minus the initial full fine-tune.\n\n\nmonkeypatch.llama_expand_mask\nexpands the binary attention mask per 3.2.2 of https://arxiv.org/pdf/2107.02027.pdf\n\n\nmonkeypatch.lora_kernels\nModule for patching custom LoRA Triton kernels and torch.autograd functions.\n\n\nmonkeypatch.utils\nShared utils for the monkeypatches\n\n\nmonkeypatch.btlm_attn_hijack_flash\nFlash attention monkey patch for cerebras btlm model\n\n\nmonkeypatch.llama_patch_multipack\nPatched LlamaAttention to use torch.nn.functional.scaled_dot_product_attention\n\n\nmonkeypatch.stablelm_attn_hijack_flash\nPyTorch StableLM Epoch model.\n\n\nmonkeypatch.trainer_fsdp_optim\nfix for FSDP optimizer save in trainer w 4.47.0\n\n\nmonkeypatch.transformers_fa_utils\nsee https://github.com/huggingface/transformers/pull/35834\n\n\nmonkeypatch.unsloth_\nmodule for patching with unsloth optimizations\n\n\nmonkeypatch.data.batch_dataset_fetcher\nmonkey patches for the dataset fetcher to handle batches of packed indexes\n\n\nmonkeypatch.mixtral\nPatches to support multipack for mixtral\n\n\nmonkeypatch.gradient_checkpointing.offload_cpu\nCPU offloaded checkpointing\n\n\nmonkeypatch.gradient_checkpointing.offload_disk\nDISCO - DIsk-based Storage and Checkpointing with Optimized prefetching\n\n\n\n\n\n\nUtility functions\n\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nUtilities for distributed functionality.\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\nData handling specific to SFT.\n\n\nutils.quantization\nUtilities for quantization including QAT and PTQ using torchao.\n\n\n\n\n\n\nPydantic data models for Axolotl config\n\n\n\nutils.schemas.config\nModule with Pydantic models for configuration.\n\n\nutils.schemas.model\nPydantic models for model input / output, etc. configuration\n\n\nutils.schemas.training\nPydantic models for training hyperparameters\n\n\nutils.schemas.datasets\nPydantic models for datasets-related configuration\n\n\nutils.schemas.peft\nPydantic models for PEFT-related configuration\n\n\nutils.schemas.trl\nPydantic models for TRL trainer configuration\n\n\nutils.schemas.multimodal\nPydantic models for multimodal-related configuration\n\n\nutils.schemas.integrations\nPydantic models for Axolotl integrations\n\n\nutils.schemas.enums\nEnums for Axolotl input config\n\n\nutils.schemas.utils\nUtilities for Axolotl Pydantic models\n\n\n\n\n\n\nThird-party integrations and extensions\n\n\n\nintegrations.base\nBase class for all plugins.\n\n\nintegrations.cut_cross_entropy.args\nModule for handling Cut Cross Entropy input arguments.\n\n\nintegrations.grokfast.optimizer\n\n\n\nintegrations.kd.trainer\nKD trainer\n\n\nintegrations.liger.args\nModule for handling LIGER input arguments.\n\n\nintegrations.lm_eval.args\nModule for handling lm eval harness input arguments.\n\n\nintegrations.spectrum.args\nModule for handling Spectrum input arguments.\n\n\n\n\n\n\nCommon utilities and shared functionality\n\n\n\ncommon.architectures\nCommon architecture specific constants\n\n\ncommon.const\nVarious shared constants\n\n\ncommon.datasets\nDataset loading utilities.\n\n\n\n\n\n\nCustom model implementations\n\n\n\nmodels.mamba.modeling_mamba\n\n\n\n\n\n\n\nData processing utilities\n\n\n\nutils.collators.core\nbasic shared collator constants\n\n\nutils.collators.batching\nData collators for axolotl to pad labels and position_ids for packed sequences\n\n\nutils.collators.mamba\ncollators for Mamba\n\n\nutils.collators.mm_chat\nCollators for multi-modal chat messages and packing\n\n\nutils.samplers.multipack\nMultipack Batch Sampler - An efficient batch sampler for packing variable-length sequences\n\n\n\n\n\n\nTraining callbacks\n\n\n\nutils.callbacks.perplexity\ncallback to calculate perplexity as an evaluation metric.\n\n\nutils.callbacks.profiler\nHF Trainer callback for creating pytorch profiling snapshots\n\n\nutils.callbacks.lisa\nmodule for LISA\n\n\nutils.callbacks.mlflow_\nMLFlow module for trainer callbacks\n\n\nutils.callbacks.comet_\nComet module for trainer callbacks\n\n\nutils.callbacks.qat\nQAT Callback for HF Causal Trainer" }, { "objectID": "docs/api/index.html#core", @@ -2630,7 +2630,7 @@ "href": "docs/api/index.html#utils", "title": "API Reference", "section": "", - "text": "Utility functions\n\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nutility helpers for distributed checks\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\nData handling specific to SFT.\n\n\nutils.quantization\nUtilities for quantization including QAT and PTQ using torchao." + "text": "Utility functions\n\n\n\nutils.tokenization\nModule for tokenization utilities\n\n\nutils.chat_templates\nThis module provides functionality for selecting chat templates based on user choices.\n\n\nutils.lora\nmodule to get the state dict of a merged lora model\n\n\nutils.model_shard_quant\nmodule to handle loading model on cpu/meta device for FSDP\n\n\nutils.bench\nBenchmarking and measurement utilities\n\n\nutils.freeze\nmodule to freeze/unfreeze parameters by name\n\n\nutils.trainer\nModule containing the Trainer class and related functions\n\n\nutils.schedulers\nModule for custom LRScheduler class\n\n\nutils.distributed\nUtilities for distributed functionality.\n\n\nutils.dict\nModule containing the DictDefault class\n\n\nutils.optimizers.adopt\nCopied from https://github.com/iShohei220/adopt\n\n\nutils.data.pretraining\ndata handling specific to pretraining\n\n\nutils.data.sft\nData handling specific to SFT.\n\n\nutils.quantization\nUtilities for quantization including QAT and PTQ using torchao." }, { "objectID": "docs/api/index.html#schemas", diff --git a/sitemap.xml b/sitemap.xml index c0194c890..932cdaedd 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,758 +2,758 @@ https://docs.axolotl.ai/docs/unsloth.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/mac.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.062Z https://docs.axolotl.ai/docs/nccl.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/multi-node.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/docker.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.062Z https://docs.axolotl.ai/docs/inference.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.062Z https://docs.axolotl.ai/docs/cli.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/config-reference.html - 2025-06-18T20:02:45.606Z + 2025-06-19T15:20:20.894Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.062Z https://docs.axolotl.ai/docs/debugging.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/multimodal.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/api/cli.sweeps.html - 2025-06-18T20:02:31.898Z + 2025-06-19T15:20:07.487Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-06-18T20:02:32.225Z + 2025-06-19T15:20:07.810Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-06-18T20:02:32.622Z + 2025-06-19T15:20:08.198Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-06-18T20:02:32.428Z + 2025-06-19T15:20:08.006Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-06-18T20:02:31.946Z + 2025-06-19T15:20:07.535Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-06-18T20:02:32.380Z + 2025-06-19T15:20:07.960Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2025-06-18T20:02:32.018Z + 2025-06-19T15:20:07.607Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-06-18T20:02:31.748Z + 2025-06-19T15:20:07.339Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-06-18T20:02:32.734Z + 2025-06-19T15:20:08.309Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-06-18T20:02:32.490Z + 2025-06-19T15:20:08.068Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-06-18T20:02:32.127Z + 2025-06-19T15:20:07.714Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-06-18T20:02:32.259Z + 2025-06-19T15:20:07.842Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-06-18T20:02:31.940Z + 2025-06-19T15:20:07.528Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-06-18T20:02:32.390Z + 2025-06-19T15:20:07.970Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-06-18T20:02:32.191Z + 2025-06-19T15:20:07.777Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-06-18T20:02:32.283Z + 2025-06-19T15:20:07.866Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-06-18T20:02:32.181Z + 2025-06-19T15:20:07.767Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-06-18T20:02:32.399Z + 2025-06-19T15:20:07.979Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-06-18T20:02:32.947Z + 2025-06-19T15:20:08.518Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-06-18T20:02:32.725Z + 2025-06-19T15:20:08.300Z https://docs.axolotl.ai/docs/api/core.builders.rl.html - 2025-06-18T20:02:31.703Z + 2025-06-19T15:20:07.294Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-06-18T20:02:31.607Z + 2025-06-19T15:20:07.201Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-06-18T20:02:32.398Z + 2025-06-19T15:20:07.978Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-06-18T20:02:32.426Z + 2025-06-19T15:20:08.005Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2025-06-18T20:02:32.061Z + 2025-06-19T15:20:07.650Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-06-18T20:02:32.906Z + 2025-06-19T15:20:08.479Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-06-18T20:02:31.872Z + 2025-06-19T15:20:07.460Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-06-18T20:02:31.884Z + 2025-06-19T15:20:07.472Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-06-18T20:02:32.507Z + 2025-06-19T15:20:08.085Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-06-18T20:02:32.175Z + 2025-06-19T15:20:07.761Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-06-18T20:02:32.978Z + 2025-06-19T15:20:08.548Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-06-18T20:02:32.661Z + 2025-06-19T15:20:08.237Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-06-18T20:02:32.142Z + 2025-06-19T15:20:07.728Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-06-18T20:02:32.922Z + 2025-06-19T15:20:08.494Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-06-18T20:02:32.443Z + 2025-06-19T15:20:08.021Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-06-18T20:02:31.937Z + 2025-06-19T15:20:07.525Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-06-18T20:02:32.489Z + 2025-06-19T15:20:08.067Z https://docs.axolotl.ai/docs/api/loaders.patch_manager.html - 2025-06-18T20:02:32.051Z + 2025-06-19T15:20:07.640Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-06-18T20:02:32.755Z + 2025-06-19T15:20:08.329Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-06-18T20:02:33.025Z + 2025-06-19T15:20:08.594Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-06-18T20:02:31.930Z + 2025-06-19T15:20:07.518Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-06-18T20:02:32.696Z + 2025-06-19T15:20:08.271Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-06-18T20:02:32.187Z + 2025-06-19T15:20:07.772Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-06-18T20:02:32.642Z + 2025-06-19T15:20:08.218Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html - 2025-06-18T20:02:32.541Z + 2025-06-19T15:20:08.118Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-06-18T20:02:32.501Z + 2025-06-19T15:20:08.079Z https://docs.axolotl.ai/docs/api/core.builders.base.html - 2025-06-18T20:02:31.690Z + 2025-06-19T15:20:07.282Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-06-18T20:02:31.977Z + 2025-06-19T15:20:07.566Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-06-18T20:02:31.804Z + 2025-06-19T15:20:07.394Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-06-18T20:02:32.653Z + 2025-06-19T15:20:08.229Z https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html - 2025-06-18T20:02:33.044Z + 2025-06-19T15:20:08.613Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-06-18T20:02:31.994Z + 2025-06-19T15:20:07.583Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-06-18T20:02:31.743Z + 2025-06-19T15:20:07.334Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-06-18T20:02:32.451Z + 2025-06-19T15:20:08.029Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-06-18T20:02:31.849Z + 2025-06-19T15:20:07.438Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-06-18T20:02:31.892Z + 2025-06-19T15:20:07.481Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-06-18T20:02:31.961Z + 2025-06-19T15:20:07.550Z https://docs.axolotl.ai/docs/api/convert.html - 2025-06-18T20:02:31.631Z + 2025-06-19T15:20:07.225Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-06-18T20:02:32.208Z + 2025-06-19T15:20:07.794Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-06-18T20:02:32.738Z + 2025-06-19T15:20:08.312Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-06-18T20:02:31.824Z + 2025-06-19T15:20:07.413Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-06-18T20:02:32.214Z + 2025-06-19T15:20:07.800Z https://docs.axolotl.ai/docs/api/loaders.constants.html - 2025-06-18T20:02:32.053Z + 2025-06-19T15:20:07.642Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-06-18T20:02:31.683Z + 2025-06-19T15:20:07.275Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-06-18T20:02:31.863Z + 2025-06-19T15:20:07.452Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2025-06-18T20:02:32.092Z + 2025-06-19T15:20:07.680Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-06-18T20:02:32.929Z + 2025-06-19T15:20:08.501Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-06-18T20:02:32.708Z + 2025-06-19T15:20:08.283Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-06-18T20:02:32.202Z + 2025-06-19T15:20:07.787Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-06-18T20:02:32.580Z + 2025-06-19T15:20:08.157Z https://docs.axolotl.ai/docs/api/loaders.tokenizer.html - 2025-06-18T20:02:32.036Z + 2025-06-19T15:20:07.625Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-06-18T20:02:32.572Z + 2025-06-19T15:20:08.149Z https://docs.axolotl.ai/docs/api/utils.quantization.html - 2025-06-18T20:02:32.682Z + 2025-06-19T15:20:08.258Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/input_output.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.062Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/index.html - 2025-06-18T19:59:20.605Z + 2025-06-19T15:17:03.076Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-06-18T19:59:20.609Z + 2025-06-19T15:17:03.080Z https://docs.axolotl.ai/FAQS.html - 2025-06-18T19:59:20.586Z + 2025-06-19T15:17:03.057Z https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-06-18T19:59:20.610Z + 2025-06-19T15:17:03.080Z https://docs.axolotl.ai/TODO.html - 2025-06-18T19:59:20.586Z + 2025-06-19T15:17:03.057Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-06-18T19:59:20.593Z + 2025-06-19T15:17:03.064Z https://docs.axolotl.ai/docs/torchao.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/quantize.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/qat.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-06-18T20:02:32.563Z + 2025-06-19T15:20:08.140Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-06-18T20:02:32.154Z + 2025-06-19T15:20:07.740Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-06-18T20:02:32.498Z + 2025-06-19T15:20:08.075Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-06-18T20:02:32.950Z + 2025-06-19T15:20:08.521Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-06-18T20:02:32.198Z + 2025-06-19T15:20:07.784Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-06-18T20:02:33.028Z + 2025-06-19T15:20:08.597Z https://docs.axolotl.ai/docs/api/utils.data.pretraining.html - 2025-06-18T20:02:32.654Z + 2025-06-19T15:20:08.230Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-06-18T20:02:33.030Z + 2025-06-19T15:20:08.599Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-06-18T20:02:32.597Z + 2025-06-19T15:20:08.174Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-06-18T20:02:32.910Z + 2025-06-19T15:20:08.482Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-06-18T20:02:32.703Z + 2025-06-19T15:20:08.278Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-06-18T20:02:32.510Z + 2025-06-19T15:20:08.088Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-06-18T20:02:32.237Z + 2025-06-19T15:20:07.821Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-06-18T20:02:31.618Z + 2025-06-19T15:20:07.211Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-06-18T20:02:32.765Z + 2025-06-19T15:20:08.340Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-06-18T20:02:32.918Z + 2025-06-19T15:20:08.491Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-06-18T20:02:32.481Z + 2025-06-19T15:20:08.059Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-06-18T20:02:32.969Z + 2025-06-19T15:20:08.540Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2025-06-18T20:02:32.016Z + 2025-06-19T15:20:07.606Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-06-18T20:02:32.094Z + 2025-06-19T15:20:07.681Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-06-18T20:02:32.444Z + 2025-06-19T15:20:08.022Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-06-18T20:02:32.280Z + 2025-06-19T15:20:07.862Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-06-18T20:02:32.240Z + 2025-06-19T15:20:07.824Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-06-18T20:02:31.740Z + 2025-06-19T15:20:07.331Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2025-06-18T20:02:32.069Z + 2025-06-19T15:20:07.657Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-06-18T20:02:32.568Z + 2025-06-19T15:20:08.145Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-06-18T20:02:32.257Z + 2025-06-19T15:20:07.840Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-06-18T20:02:32.548Z + 2025-06-19T15:20:08.125Z https://docs.axolotl.ai/docs/api/loaders.model.html - 2025-06-18T20:02:32.028Z + 2025-06-19T15:20:07.617Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-06-18T20:02:33.033Z + 2025-06-19T15:20:08.602Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-06-18T20:02:32.004Z + 2025-06-19T15:20:07.594Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-06-18T20:02:31.788Z + 2025-06-19T15:20:07.377Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-06-18T20:02:33.037Z + 2025-06-19T15:20:08.606Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-06-18T20:02:32.558Z + 2025-06-19T15:20:08.135Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-06-18T20:02:32.771Z + 2025-06-19T15:20:08.345Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-06-18T20:02:32.930Z + 2025-06-19T15:20:08.502Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-06-18T20:02:32.452Z + 2025-06-19T15:20:08.030Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-06-18T20:02:32.140Z + 2025-06-19T15:20:07.727Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-06-18T20:02:33.018Z + 2025-06-19T15:20:08.588Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-06-18T20:02:32.911Z + 2025-06-19T15:20:08.483Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-06-18T20:02:32.236Z + 2025-06-19T15:20:07.820Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-06-18T20:02:32.512Z + 2025-06-19T15:20:08.089Z https://docs.axolotl.ai/docs/api/train.html - 2025-06-18T20:02:31.597Z + 2025-06-19T15:20:07.190Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-06-18T20:02:32.492Z + 2025-06-19T15:20:08.070Z https://docs.axolotl.ai/docs/api/index.html - 2025-06-18T20:02:31.534Z + 2025-06-19T15:20:07.129Z https://docs.axolotl.ai/docs/api/loaders.adapter.html - 2025-06-18T20:02:32.043Z + 2025-06-19T15:20:07.632Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-06-18T20:02:32.743Z + 2025-06-19T15:20:08.317Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-06-18T20:02:32.369Z + 2025-06-19T15:20:07.950Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-06-18T20:02:32.249Z + 2025-06-19T15:20:07.832Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-06-18T20:02:31.831Z + 2025-06-19T15:20:07.420Z https://docs.axolotl.ai/docs/api/cli.quantize.html - 2025-06-18T20:02:31.951Z + 2025-06-19T15:20:07.540Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-06-18T20:02:32.925Z + 2025-06-19T15:20:08.497Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-06-18T20:02:31.738Z + 2025-06-19T15:20:07.329Z https://docs.axolotl.ai/docs/api/core.builders.causal.html - 2025-06-18T20:02:31.695Z + 2025-06-19T15:20:07.286Z https://docs.axolotl.ai/docs/api/core.trainers.relora.html - 2025-06-18T20:02:31.987Z + 2025-06-19T15:20:07.577Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-06-18T20:02:32.949Z + 2025-06-19T15:20:08.519Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html - 2025-06-18T20:02:32.515Z + 2025-06-19T15:20:08.093Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2025-06-18T20:02:31.983Z + 2025-06-19T15:20:07.572Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-06-18T20:02:31.756Z + 2025-06-19T15:20:07.346Z https://docs.axolotl.ai/docs/api/loaders.processor.html - 2025-06-18T20:02:32.037Z + 2025-06-19T15:20:07.627Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-06-18T20:02:31.741Z + 2025-06-19T15:20:07.332Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-06-18T20:02:32.213Z + 2025-06-19T15:20:07.798Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-06-18T20:02:31.796Z + 2025-06-19T15:20:07.385Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2025-06-18T20:02:32.058Z + 2025-06-19T15:20:07.647Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-06-18T20:02:32.973Z + 2025-06-19T15:20:08.543Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-06-18T20:02:32.509Z + 2025-06-19T15:20:08.087Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-06-18T20:02:32.645Z + 2025-06-19T15:20:08.221Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-06-18T20:02:32.162Z + 2025-06-19T15:20:07.748Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-06-18T20:02:31.715Z + 2025-06-19T15:20:07.306Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-06-18T20:02:32.239Z + 2025-06-19T15:20:07.823Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-06-18T20:02:31.674Z + 2025-06-19T15:20:07.266Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-06-18T20:02:32.932Z + 2025-06-19T15:20:08.503Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.060Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/getting-started.html - 2025-06-18T19:59:20.589Z + 2025-06-19T15:17:03.060Z https://docs.axolotl.ai/docs/faq.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.060Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.062Z https://docs.axolotl.ai/docs/rlhf.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/installation.html - 2025-06-18T19:59:20.591Z + 2025-06-19T15:17:03.062Z https://docs.axolotl.ai/docs/multipack.html - 2025-06-18T19:59:20.592Z + 2025-06-19T15:17:03.063Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-06-18T19:59:20.588Z + 2025-06-19T15:17:03.059Z