diff --git a/.nojekyll b/.nojekyll index 84a10c498..5673794d1 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -f92aa588 \ No newline at end of file +92924141 \ No newline at end of file diff --git a/docs/faq.html b/docs/faq.html index cd5cf252f..acc545f2b 100644 --- a/docs/faq.html +++ b/docs/faq.html @@ -601,6 +601,14 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

A: There seems to be a wheel issue with FA2 2.8.0 on CUDA 12.4. Try CUDA 12.6 instead or downgrade to FA2 2.7.4. Please refer to the upstream issue: https://github.com/Dao-AILab/flash-attention/issues/1717.

+

Q: Can we mix text and text+image datasets for VLM training?

+
+

A: Yes, you can for newer VLM arch. The ones that would not work are LLaVA / Pixtral arch. If you notice one not working, please let us know!

+
+

Q: Why is memory/max_* different from nvidia-smi?

+
+

A: We use torch APIs to retrieve this information. You can see https://docs.pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management for more information.

+

Chat templates

diff --git a/docs/lr_groups.html b/docs/lr_groups.html index 9cc9c3bbd..baf02d267 100644 --- a/docs/lr_groups.html +++ b/docs/lr_groups.html @@ -563,6 +563,19 @@ modules in a model.

In this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate of 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s self attention q_proj module.

+
+
+
+ +
+
+Note +
+
+
+

We currently only support varying lr for now. If you’re interested in adding support for others (weight_decay), we welcome PRs. See https://github.com/axolotl-ai-cloud/axolotl/blob/613bcf90e58f3ab81d3827e7fc572319908db9fb/src/axolotl/core/trainers/mixins/optimizer.py#L17

+
+
diff --git a/docs/multimodal.html b/docs/multimodal.html index 11ccd0f96..15a731053 100644 --- a/docs/multimodal.html +++ b/docs/multimodal.html @@ -523,6 +523,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
  • Gemma-3n
  • Qwen2-VL
  • Qwen2.5-VL
  • +
  • Qwen3-VL
  • SmolVLM2
  • LFM2-VL
  • @@ -605,19 +606,32 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); image_size: 512 image_resize_algorithm: bilinear

    Please see examples folder for full configs.

    -
    +
    -Warning +Tip

    Some of our chat_templates have been extended to support broader dataset types. This should not break any existing configs.

    +
    +
    +
    + +
    +
    +Note +
    +
    +
    +

    As of now, we do not truncate nor drop samples based on sequence_len as each arch has different ways to process non-text tokens. We are looking for help on this.

    +
    +

    Mllama

    base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
    @@ -757,6 +771,12 @@ Tip
     
     chat_template: qwen2_vl  # same as qwen2-vl
    +
    +

    Qwen3-VL

    +
    base_model: Qwen/Qwen3-VL-4B-Instruct
    +
    +chat_template: qwen2_vl  # same as qwen2-vl
    +

    SmolVLM2

    @@ -772,7 +792,7 @@ Tip

    Please make sure to install num2words via pip3 install num2words==0.5.14

    -
    base_model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct
    +
    base_model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct

    LFM2-VL

    @@ -789,7 +809,7 @@ Warning

    Please uninstall causal-conv1d via pip3 uninstall -y causal-conv1d

    -
    base_model: LiquidAI/LFM2-VL-450M
    +
    base_model: LiquidAI/LFM2-VL-450M
    @@ -874,31 +894,31 @@ Warning

    Example

    Here is an example of a multi-modal dataset:

    -
    [
    -  {
    -    "messages": [
    -        {
    -            "role": "system",
    -            "content": [
    -              {"type": "text", "text": "You are a helpful assistant."}
    -              ]
    -        },
    -        {
    -            "role": "user",
    -            "content": [
    -                {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
    -                {"type": "text", "text": "Describe this image in detail."}
    -            ]
    -        },
    -        {
    -            "role": "assistant",
    -            "content": [
    -              {"type": "text", "text": "The image is a bee."}
    -            ]
    -        }
    -    ]
    -  }
    -]
    +
    [
    +  {
    +    "messages": [
    +        {
    +            "role": "system",
    +            "content": [
    +              {"type": "text", "text": "You are a helpful assistant."}
    +              ]
    +        },
    +        {
    +            "role": "user",
    +            "content": [
    +                {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
    +                {"type": "text", "text": "Describe this image in detail."}
    +            ]
    +        },
    +        {
    +            "role": "assistant",
    +            "content": [
    +              {"type": "text", "text": "The image is a bee."}
    +            ]
    +        }
    +    ]
    +  }
    +]
    diff --git a/search.json b/search.json index d5ee176e5..4e3147950 100644 --- a/search.json +++ b/search.json @@ -176,7 +176,7 @@ "href": "docs/lr_groups.html#example", "title": "Learning Rate Groups", "section": "Example", - "text": "Example\nlr_groups:\n - name: o_proj\n modules:\n - self_attn.o_proj.weight\n lr: 1e-6\n - name: q_proj\n modules:\n - model.layers.2.self_attn.q_proj.weight\n lr: 1e-5\n\nlearning_rate: 2e-5\nIn this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate\nof 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s\nself attention q_proj module.", + "text": "Example\nlr_groups:\n - name: o_proj\n modules:\n - self_attn.o_proj.weight\n lr: 1e-6\n - name: q_proj\n modules:\n - model.layers.2.self_attn.q_proj.weight\n lr: 1e-5\n\nlearning_rate: 2e-5\nIn this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate\nof 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s\nself attention q_proj module.\n\n\n\n\n\n\nNote\n\n\n\nWe currently only support varying lr for now. If you’re interested in adding support for others (weight_decay), we welcome PRs. See https://github.com/axolotl-ai-cloud/axolotl/blob/613bcf90e58f3ab81d3827e7fc572319908db9fb/src/axolotl/core/trainers/mixins/optimizer.py#L17", "crumbs": [ "How To Guides", "Learning Rate Groups" @@ -821,7 +821,7 @@ "href": "docs/multimodal.html#usage", "title": "MultiModal / Vision Language Models (BETA)", "section": "Usage", - "text": "Usage\nMultimodal support is limited and doesn’t have full feature parity.\nHere are the hyperparams you’ll need to use to finetune a multimodal model.\nprocessor_type: AutoProcessor\n\nskip_prepare_dataset: true\nremove_unused_columns: false # leave columns in place as they are needed to handle image embeddings during training\nsample_packing: false # not yet supported with multimodal\n\nchat_template: # see in next section if specified\n\n# example dataset\ndatasets:\n - path: HuggingFaceH4/llava-instruct-mix-vsft\n type: chat_template\n split: train[:1%]\n\n# (optional) if doing lora, only finetune the Language model,\n# leave the vision model and vision tower frozen\n# load_in_8bit: true\nadapter: lora\nlora_target_modules: 'model.language_model.layers.[\\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'\n\n# (optional) if you want to resize images to a set size\nimage_size: 512\nimage_resize_algorithm: bilinear\nPlease see examples folder for full configs.\n\n\n\n\n\n\nWarning\n\n\n\nSome of our chat_templates have been extended to support broader dataset types. This should not break any existing configs.\n\n\n\nMllama\nbase_model: meta-llama/Llama-3.2-11B-Vision-Instruct\n\nchat_template: llama3_2_vision\n\n\nLlama4\nbase_model: meta-llama/Llama-4-Scout-17B-16E-Instruct\n\nchat_template: llama4\n\n\nPixtral\nbase_model: mistralai/Pixtral-12B-2409\n\nchat_template: pixtral\n\n\nLlava-1.5\nbase_model: llava-hf/llava-1.5-7b-hf\n\nchat_template: llava\n\n\nMistral-Small-3.1\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install vision lib via pip install 'mistral-common[opencv]==1.8.5'\n\n\nbase_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503\n\n\nMagistral-Small-2509\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install vision lib via pip install 'mistral-common[opencv]==1.8.5'\n\n\nbase_model: mistralai/Magistral-Small-2509\n\n\nVoxtral\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install audio lib via pip3 install librosa==0.11.0 'mistral_common[audio]==1.8.3'\n\n\nbase_model: mistralai/Voxtral-Mini-3B-2507\n\n\nGemma-3\n\n\n\n\n\n\nTip\n\n\n\nThe Gemma3-1B model is a text-only model, so please train as regular text model.\n\n\nFor multi-modal 4B/12B/27B models, use the following config:\nbase_model: google/gemma-3-4b-it\n\nchat_template: gemma3\n\n\nGemma-3n\n\n\n\n\n\n\nWarning\n\n\n\nThe model’s initial loss and grad norm will be very high. We suspect this to be due to the Conv in the vision layers.\n\n\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install timm via pip3 install timm==1.0.17\n\n\nbase_model: google/gemma-3n-E2B-it\n\nchat_template: gemma3n\n\n\nQwen2-VL\nbase_model: Qwen/Qwen2-VL-7B-Instruct\n\nchat_template: qwen2_vl\n\n\nQwen2.5-VL\nbase_model: Qwen/Qwen2.5-VL-7B-Instruct\n\nchat_template: qwen2_vl # same as qwen2-vl\n\n\nSmolVLM2\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install num2words via pip3 install num2words==0.5.14\n\n\nbase_model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct\n\n\nLFM2-VL\n\n\n\n\n\n\nWarning\n\n\n\nPlease uninstall causal-conv1d via pip3 uninstall -y causal-conv1d\n\n\nbase_model: LiquidAI/LFM2-VL-450M", + "text": "Usage\nMultimodal support is limited and doesn’t have full feature parity.\nHere are the hyperparams you’ll need to use to finetune a multimodal model.\nprocessor_type: AutoProcessor\n\nskip_prepare_dataset: true\nremove_unused_columns: false # leave columns in place as they are needed to handle image embeddings during training\nsample_packing: false # not yet supported with multimodal\n\nchat_template: # see in next section if specified\n\n# example dataset\ndatasets:\n - path: HuggingFaceH4/llava-instruct-mix-vsft\n type: chat_template\n split: train[:1%]\n\n# (optional) if doing lora, only finetune the Language model,\n# leave the vision model and vision tower frozen\n# load_in_8bit: true\nadapter: lora\nlora_target_modules: 'model.language_model.layers.[\\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'\n\n# (optional) if you want to resize images to a set size\nimage_size: 512\nimage_resize_algorithm: bilinear\nPlease see examples folder for full configs.\n\n\n\n\n\n\nTip\n\n\n\nSome of our chat_templates have been extended to support broader dataset types. This should not break any existing configs.\n\n\n\n\n\n\n\n\nNote\n\n\n\nAs of now, we do not truncate nor drop samples based on sequence_len as each arch has different ways to process non-text tokens. We are looking for help on this.\n\n\n\nMllama\nbase_model: meta-llama/Llama-3.2-11B-Vision-Instruct\n\nchat_template: llama3_2_vision\n\n\nLlama4\nbase_model: meta-llama/Llama-4-Scout-17B-16E-Instruct\n\nchat_template: llama4\n\n\nPixtral\nbase_model: mistralai/Pixtral-12B-2409\n\nchat_template: pixtral\n\n\nLlava-1.5\nbase_model: llava-hf/llava-1.5-7b-hf\n\nchat_template: llava\n\n\nMistral-Small-3.1\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install vision lib via pip install 'mistral-common[opencv]==1.8.5'\n\n\nbase_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503\n\n\nMagistral-Small-2509\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install vision lib via pip install 'mistral-common[opencv]==1.8.5'\n\n\nbase_model: mistralai/Magistral-Small-2509\n\n\nVoxtral\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install audio lib via pip3 install librosa==0.11.0 'mistral_common[audio]==1.8.3'\n\n\nbase_model: mistralai/Voxtral-Mini-3B-2507\n\n\nGemma-3\n\n\n\n\n\n\nTip\n\n\n\nThe Gemma3-1B model is a text-only model, so please train as regular text model.\n\n\nFor multi-modal 4B/12B/27B models, use the following config:\nbase_model: google/gemma-3-4b-it\n\nchat_template: gemma3\n\n\nGemma-3n\n\n\n\n\n\n\nWarning\n\n\n\nThe model’s initial loss and grad norm will be very high. We suspect this to be due to the Conv in the vision layers.\n\n\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install timm via pip3 install timm==1.0.17\n\n\nbase_model: google/gemma-3n-E2B-it\n\nchat_template: gemma3n\n\n\nQwen2-VL\nbase_model: Qwen/Qwen2-VL-7B-Instruct\n\nchat_template: qwen2_vl\n\n\nQwen2.5-VL\nbase_model: Qwen/Qwen2.5-VL-7B-Instruct\n\nchat_template: qwen2_vl # same as qwen2-vl\n\n\nQwen3-VL\nbase_model: Qwen/Qwen3-VL-4B-Instruct\n\nchat_template: qwen2_vl # same as qwen2-vl\n\n\nSmolVLM2\n\n\n\n\n\n\nTip\n\n\n\nPlease make sure to install num2words via pip3 install num2words==0.5.14\n\n\nbase_model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct\n\n\nLFM2-VL\n\n\n\n\n\n\nWarning\n\n\n\nPlease uninstall causal-conv1d via pip3 uninstall -y causal-conv1d\n\n\nbase_model: LiquidAI/LFM2-VL-450M", "crumbs": [ "How To Guides", "MultiModal / Vision Language Models (BETA)" @@ -909,7 +909,7 @@ "href": "docs/faq.html", "title": "FAQ", "section": "", - "text": "General\nQ: The trainer stopped and hasn’t progressed in several minutes.\n\nA: Usually an issue with the GPUs communicating with each other. See the NCCL doc\n\nQ: exitcode: -9\n\nA: This usually happens when you run out of system RAM.\n\nQ: exitcode: -7 while using deepspeed\n\nA: Try upgrading deepspeed w: pip install -U deepspeed\n\nQ: AttributeError: ‘DummyOptim’ object has no attribute ‘step’\nQ: ModuleNotFoundError: No module named ‘mpi4py’ using single GPU with deepspeed\n\nA: You may be using deepspeed with single gpu. Please remove the deepspeed: section in the yaml file or --deepspeed CLI flag.\n\nQ: The codes is stuck on saving preprocessed datasets.\n\nA: This is usually an issue with the GPU. This can be resolved through setting the os environment variable CUDA_VISIBLE_DEVICES=0. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.\n\nQ: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.\n\nA: This is likely due to vocab size mismatch. By default, Axolotl expands the model’s embeddings if the tokenizer has more tokens than the model. Please use the axolotl merge-lora command to merge the adapters instead of using your own scripts.\n\n\nOn the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model’s embeddings unless shrink_embeddings: true is set in the config.\n\nQ: How to call Axolotl via custom python scripts?\n\nA: Since Axolotl is just Python, please see src/axolotl/cli/main.py on how each command is called.\n\nQ: How to know the value to use for fsdp_transformer_layer_cls_to_wrap?\n\nA: This is the class name of the transformer layer to wrap with FSDP. For example, for LlamaForCausalLM, the value is LlamaDecoderLayer. To find this for a specific model, check the model’s PreTrainedModel definition and look for _no_split_modules variable in the modeling_<model_name>.py file within transformers library.\n\nQ: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token\n\nA: This is because the tokenizer does not have a padding token. Please add a padding token to the tokenizer via:\n\n\nspecial_tokens:\n # str. If you're not sure, set to same as `eos_token`.\n pad_token: \"...\"\n\nQ: IterableDataset error or KeyError: 'input_ids' when using preprocess CLI\n\nA: This is because you may be using preprocess CLI with pretraining_dataset: or skip_prepare_dataset: true respectively. Please use axolotl train CLI directly instead as these datasets are prepared on demand.\n\nQ: vLLM is not working with Axolotl\n\nA: We currently recommend torch 2.6.0 for use with vllm. Please ensure you use the right version. For Docker, please use the main-py3.11-cu124-2.6.0 tag.\n\nQ: FA2 2.8.0 undefined symbol runtime error on CUDA 12.4\n\nA: There seems to be a wheel issue with FA2 2.8.0 on CUDA 12.4. Try CUDA 12.6 instead or downgrade to FA2 2.7.4. Please refer to the upstream issue: https://github.com/Dao-AILab/flash-attention/issues/1717.\n\n\n\nChat templates\nQ: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____\n\nA: This means that the property mapping for the stated attribute does not exist when building chat_template prompt. For example, if no attribute 'content', please check you have added the correct mapping for content under message_property_mappings.\n\nQ: Empty template generated for turn ___\n\nA: The content is empty for that turn.\n\nQ: Could not find content start/end boundary for turn __\n\nA: The specific turn’s start/end could not be detected. Please ensure you have set the eos_token following your chat_template. Otherwise, this could be a chat_template which doesn’t use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not [[dummy_message]]. Please let us know about this.\n\nQ: Content end boundary is before start boundary for turn ___\n\nA: This is an edge case which should not occur. Please create an Issue if this happens.\n\nQ: Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.\n\nA: This is likely an empty turn.\n\nQ: The EOS token is incorrectly being masked or not being masked / EOS token __ not found in chat template.\n\nA: There can be two reasons:\n\n\n\nThis is because of the mismatch between tokenizer.eos_token and EOS token in template. Please make sure to set eos_token: under special_tokens: to the same EOS token as in template.\n\n\n\n\nThe EOS token is not in the template. Please check if your template is correct. As an example, phi_35 template does not use its dedicated EOS token <|endoftext|> at the end.\n\n\nQ: “chat_template choice is tokenizer_default but tokenizer’s chat_template is null. Please add a chat_template in tokenizer config”\n\nA: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See chat_template for more details.\n\nQ: The EOT token(s) are incorrectly being masked or not being masked / EOT token __ not found in chat template.\n\nA: There can be two reasons:\n\n\n\nThe EOT token is different from the EOS token and was not specified under eot_tokens:. Please set eot_tokens: to the same EOT token(s) as in template.\n\n\n\n\nThere is more than one EOT token per turn in the template. Please raise an issue with examples as we recognize this as an edge case.\n\n\nQ: EOT token encoding failed. Please check if the token is valid and can be encoded.\n\nA: There could be some issue with the tokenizer or unicode encoding. Please raise an issue with examples with the EOT token & tokenizer causing the issue.\n\nQ: EOT token __ is encoded as multiple tokens.\n\nA: This is because the EOT token is encoded as multiple tokens which can cause unexpected behavior. Please add it under tokens: or (recommended) override unused added_tokens via added_tokens_overrides:.\n\nQ: Conflict between train_on_eos and train_on_eot. eos_token is in eot_tokens and train_on_eos != train_on_eot\n\nA: This is because the EOS token is in the eot_tokens: while mismatch between train_on_eos: and train_on_eot:. This will cause one to override the other. Please ensure that train_on_eos: and train_on_eot: are the same or remove the EOS token from eot_tokens:.\n\nQ: If eot_tokens: is not provided, what happens?\n\nA: If eot_tokens: is not provided, the default behavior is the same as before. EOS tokens used to delimit turns are masked/unmasked depending on whether the turn is trainable.\n\n\nInternally, eot_tokens: tokenizer.eos_token and train_on_eot: train_on_eos (which defaults to turn). This transition helps clarify the naming and behavior of EOT/EOS tokens.\n\nQ: Data processing error: CAS service error\n\nA: Try disabling XET with export HF_HUB_DISABLE_XET=1\n\nQ: torch._inductor.exc.LoweringException: NoValidChoicesError: No choices to select, please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice.\n\nA: Depending on the version of torch, you may need to include this in your YAML:\n\n\nflex_attn_compile_kwargs:\n dynamic: false\n mode: max-autotune-no-cudagraphs\n\n**Q: ValueError(\"Backward pass should have cleared tracker of all tensors\")\n\nA: This may happen due to edge cases in using the modern OffloadActivations context manager for CUDA streams. If you encounter this error, you may have success using the naive implementation with offload_activations: legacy in your YAML.\n\n**Q: Error parsing tool_calls arguments as JSON.\n\nA: There is an error parsing string arguments to a dict. Please check your dataset and the error message for more details.", + "text": "General\nQ: The trainer stopped and hasn’t progressed in several minutes.\n\nA: Usually an issue with the GPUs communicating with each other. See the NCCL doc\n\nQ: exitcode: -9\n\nA: This usually happens when you run out of system RAM.\n\nQ: exitcode: -7 while using deepspeed\n\nA: Try upgrading deepspeed w: pip install -U deepspeed\n\nQ: AttributeError: ‘DummyOptim’ object has no attribute ‘step’\nQ: ModuleNotFoundError: No module named ‘mpi4py’ using single GPU with deepspeed\n\nA: You may be using deepspeed with single gpu. Please remove the deepspeed: section in the yaml file or --deepspeed CLI flag.\n\nQ: The codes is stuck on saving preprocessed datasets.\n\nA: This is usually an issue with the GPU. This can be resolved through setting the os environment variable CUDA_VISIBLE_DEVICES=0. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it.\n\nQ: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.\n\nA: This is likely due to vocab size mismatch. By default, Axolotl expands the model’s embeddings if the tokenizer has more tokens than the model. Please use the axolotl merge-lora command to merge the adapters instead of using your own scripts.\n\n\nOn the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model’s embeddings unless shrink_embeddings: true is set in the config.\n\nQ: How to call Axolotl via custom python scripts?\n\nA: Since Axolotl is just Python, please see src/axolotl/cli/main.py on how each command is called.\n\nQ: How to know the value to use for fsdp_transformer_layer_cls_to_wrap?\n\nA: This is the class name of the transformer layer to wrap with FSDP. For example, for LlamaForCausalLM, the value is LlamaDecoderLayer. To find this for a specific model, check the model’s PreTrainedModel definition and look for _no_split_modules variable in the modeling_<model_name>.py file within transformers library.\n\nQ: ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token\n\nA: This is because the tokenizer does not have a padding token. Please add a padding token to the tokenizer via:\n\n\nspecial_tokens:\n # str. If you're not sure, set to same as `eos_token`.\n pad_token: \"...\"\n\nQ: IterableDataset error or KeyError: 'input_ids' when using preprocess CLI\n\nA: This is because you may be using preprocess CLI with pretraining_dataset: or skip_prepare_dataset: true respectively. Please use axolotl train CLI directly instead as these datasets are prepared on demand.\n\nQ: vLLM is not working with Axolotl\n\nA: We currently recommend torch 2.6.0 for use with vllm. Please ensure you use the right version. For Docker, please use the main-py3.11-cu124-2.6.0 tag.\n\nQ: FA2 2.8.0 undefined symbol runtime error on CUDA 12.4\n\nA: There seems to be a wheel issue with FA2 2.8.0 on CUDA 12.4. Try CUDA 12.6 instead or downgrade to FA2 2.7.4. Please refer to the upstream issue: https://github.com/Dao-AILab/flash-attention/issues/1717.\n\nQ: Can we mix text and text+image datasets for VLM training?\n\nA: Yes, you can for newer VLM arch. The ones that would not work are LLaVA / Pixtral arch. If you notice one not working, please let us know!\n\nQ: Why is memory/max_* different from nvidia-smi?\n\nA: We use torch APIs to retrieve this information. You can see https://docs.pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management for more information.\n\n\n\nChat templates\nQ: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____\n\nA: This means that the property mapping for the stated attribute does not exist when building chat_template prompt. For example, if no attribute 'content', please check you have added the correct mapping for content under message_property_mappings.\n\nQ: Empty template generated for turn ___\n\nA: The content is empty for that turn.\n\nQ: Could not find content start/end boundary for turn __\n\nA: The specific turn’s start/end could not be detected. Please ensure you have set the eos_token following your chat_template. Otherwise, this could be a chat_template which doesn’t use proper boundaries for each turn (like system). On the rare occurrence, make sure your content is not [[dummy_message]]. Please let us know about this.\n\nQ: Content end boundary is before start boundary for turn ___\n\nA: This is an edge case which should not occur. Please create an Issue if this happens.\n\nQ: Content end boundary is the same as start boundary for turn ___. This is likely an empty turn.\n\nA: This is likely an empty turn.\n\nQ: The EOS token is incorrectly being masked or not being masked / EOS token __ not found in chat template.\n\nA: There can be two reasons:\n\n\n\nThis is because of the mismatch between tokenizer.eos_token and EOS token in template. Please make sure to set eos_token: under special_tokens: to the same EOS token as in template.\n\n\n\n\nThe EOS token is not in the template. Please check if your template is correct. As an example, phi_35 template does not use its dedicated EOS token <|endoftext|> at the end.\n\n\nQ: “chat_template choice is tokenizer_default but tokenizer’s chat_template is null. Please add a chat_template in tokenizer config”\n\nA: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See chat_template for more details.\n\nQ: The EOT token(s) are incorrectly being masked or not being masked / EOT token __ not found in chat template.\n\nA: There can be two reasons:\n\n\n\nThe EOT token is different from the EOS token and was not specified under eot_tokens:. Please set eot_tokens: to the same EOT token(s) as in template.\n\n\n\n\nThere is more than one EOT token per turn in the template. Please raise an issue with examples as we recognize this as an edge case.\n\n\nQ: EOT token encoding failed. Please check if the token is valid and can be encoded.\n\nA: There could be some issue with the tokenizer or unicode encoding. Please raise an issue with examples with the EOT token & tokenizer causing the issue.\n\nQ: EOT token __ is encoded as multiple tokens.\n\nA: This is because the EOT token is encoded as multiple tokens which can cause unexpected behavior. Please add it under tokens: or (recommended) override unused added_tokens via added_tokens_overrides:.\n\nQ: Conflict between train_on_eos and train_on_eot. eos_token is in eot_tokens and train_on_eos != train_on_eot\n\nA: This is because the EOS token is in the eot_tokens: while mismatch between train_on_eos: and train_on_eot:. This will cause one to override the other. Please ensure that train_on_eos: and train_on_eot: are the same or remove the EOS token from eot_tokens:.\n\nQ: If eot_tokens: is not provided, what happens?\n\nA: If eot_tokens: is not provided, the default behavior is the same as before. EOS tokens used to delimit turns are masked/unmasked depending on whether the turn is trainable.\n\n\nInternally, eot_tokens: tokenizer.eos_token and train_on_eot: train_on_eos (which defaults to turn). This transition helps clarify the naming and behavior of EOT/EOS tokens.\n\nQ: Data processing error: CAS service error\n\nA: Try disabling XET with export HF_HUB_DISABLE_XET=1\n\nQ: torch._inductor.exc.LoweringException: NoValidChoicesError: No choices to select, please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice.\n\nA: Depending on the version of torch, you may need to include this in your YAML:\n\n\nflex_attn_compile_kwargs:\n dynamic: false\n mode: max-autotune-no-cudagraphs\n\n**Q: ValueError(\"Backward pass should have cleared tracker of all tensors\")\n\nA: This may happen due to edge cases in using the modern OffloadActivations context manager for CUDA streams. If you encounter this error, you may have success using the naive implementation with offload_activations: legacy in your YAML.\n\n**Q: Error parsing tool_calls arguments as JSON.\n\nA: There is an error parsing string arguments to a dict. Please check your dataset and the error message for more details.", "crumbs": [ "Troubleshooting", "FAQ" diff --git a/sitemap.xml b/sitemap.xml index 2f7ecd95b..96bc6eb5f 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,798 +2,798 @@ https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-10-22T14:22:30.578Z + 2025-10-22T22:23:34.158Z https://docs.axolotl.ai/docs/mac.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/cli.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.138Z https://docs.axolotl.ai/docs/nccl.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/getting-started.html - 2025-10-22T14:22:30.553Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/qat.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/multipack.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/streaming.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.138Z https://docs.axolotl.ai/docs/debugging.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.138Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/config-reference.html - 2025-10-22T14:26:31.001Z + 2025-10-22T22:27:34.555Z https://docs.axolotl.ai/docs/multimodal.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/faq.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/torchao.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/optimizers.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-10-22T14:26:14.504Z + 2025-10-22T22:27:18.175Z https://docs.axolotl.ai/docs/api/cli.utils.sweeps.html - 2025-10-22T14:26:13.707Z + 2025-10-22T22:27:17.472Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-10-22T14:26:13.302Z + 2025-10-22T22:27:17.115Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-10-22T14:26:14.421Z + 2025-10-22T22:27:18.102Z https://docs.axolotl.ai/docs/api/loaders.tokenizer.html - 2025-10-22T14:26:13.825Z + 2025-10-22T22:27:17.576Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-10-22T14:26:14.295Z + 2025-10-22T22:27:17.990Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html - 2025-10-22T14:26:14.381Z + 2025-10-22T22:27:18.066Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-10-22T14:26:14.554Z + 2025-10-22T22:27:18.220Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-10-22T14:26:14.362Z + 2025-10-22T22:27:18.049Z https://docs.axolotl.ai/docs/api/loaders.patch_manager.html - 2025-10-22T14:26:13.846Z + 2025-10-22T22:27:17.594Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-10-22T14:26:14.875Z + 2025-10-22T22:27:18.501Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-10-22T14:26:14.644Z + 2025-10-22T22:27:18.300Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-10-22T14:26:14.038Z + 2025-10-22T22:27:17.765Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-10-22T14:26:13.957Z + 2025-10-22T22:27:17.692Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-10-22T14:26:13.663Z + 2025-10-22T22:27:17.433Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html - 2025-10-22T14:26:14.413Z + 2025-10-22T22:27:18.094Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-10-22T14:26:14.265Z + 2025-10-22T22:27:17.965Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-10-22T14:26:14.860Z + 2025-10-22T22:27:18.488Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-10-22T14:26:14.105Z + 2025-10-22T22:27:17.824Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-10-22T14:26:14.339Z + 2025-10-22T22:27:18.029Z https://docs.axolotl.ai/docs/api/core.builders.rl.html - 2025-10-22T14:26:13.401Z + 2025-10-22T22:27:17.202Z https://docs.axolotl.ai/docs/api/loaders.processor.html - 2025-10-22T14:26:13.827Z + 2025-10-22T22:27:17.577Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-10-22T14:26:15.006Z + 2025-10-22T22:27:18.616Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-10-22T14:26:13.416Z + 2025-10-22T22:27:17.216Z https://docs.axolotl.ai/docs/api/loaders.adapter.html - 2025-10-22T14:26:13.834Z + 2025-10-22T22:27:17.583Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-10-22T14:26:13.634Z + 2025-10-22T22:27:17.407Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-10-22T14:26:13.518Z + 2025-10-22T22:27:17.305Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2025-10-22T14:26:13.859Z + 2025-10-22T22:27:17.606Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-10-22T14:26:14.005Z + 2025-10-22T22:27:17.735Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-10-22T14:26:14.017Z + 2025-10-22T22:27:17.746Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-10-22T14:26:14.329Z + 2025-10-22T22:27:18.021Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-10-22T14:26:14.044Z + 2025-10-22T22:27:17.770Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-10-22T14:26:13.981Z + 2025-10-22T22:27:17.714Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-10-22T14:26:13.445Z + 2025-10-22T22:27:17.241Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2025-10-22T14:26:13.867Z + 2025-10-22T22:27:17.613Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-10-22T14:26:14.081Z + 2025-10-22T22:27:17.803Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-10-22T14:26:14.093Z + 2025-10-22T22:27:17.814Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-10-22T14:26:14.670Z + 2025-10-22T22:27:18.322Z https://docs.axolotl.ai/docs/api/convert.html - 2025-10-22T14:26:13.318Z + 2025-10-22T22:27:17.129Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-10-22T14:26:14.083Z + 2025-10-22T22:27:17.805Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-10-22T14:26:14.594Z + 2025-10-22T22:27:18.256Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-10-22T14:26:14.681Z + 2025-10-22T22:27:18.332Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-10-22T14:26:14.341Z + 2025-10-22T22:27:18.031Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-10-22T14:26:14.052Z + 2025-10-22T22:27:17.777Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-10-22T14:26:13.787Z + 2025-10-22T22:27:17.542Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-10-22T14:26:14.879Z + 2025-10-22T22:27:18.504Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-10-22T14:26:14.909Z + 2025-10-22T22:27:18.530Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-10-22T14:26:13.450Z + 2025-10-22T22:27:17.246Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-10-22T14:26:14.130Z + 2025-10-22T22:27:17.847Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-10-22T14:26:14.991Z + 2025-10-22T22:27:18.603Z https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html - 2025-10-22T14:26:15.023Z + 2025-10-22T22:27:18.632Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-10-22T14:26:13.938Z + 2025-10-22T22:27:17.676Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-10-22T14:26:14.655Z + 2025-10-22T22:27:18.309Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-10-22T14:26:15.015Z + 2025-10-22T22:27:18.624Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-10-22T14:26:13.898Z + 2025-10-22T22:27:17.640Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-10-22T14:26:14.276Z + 2025-10-22T22:27:17.974Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-10-22T14:26:13.620Z + 2025-10-22T22:27:17.395Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-10-22T14:26:13.672Z + 2025-10-22T22:27:17.442Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2025-10-22T14:26:13.896Z + 2025-10-22T22:27:17.639Z https://docs.axolotl.ai/docs/api/index.html - 2025-10-22T14:26:13.202Z + 2025-10-22T22:27:17.027Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-10-22T14:26:14.065Z + 2025-10-22T22:27:17.788Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-10-22T14:26:14.377Z + 2025-10-22T22:27:18.062Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-10-22T14:26:14.030Z + 2025-10-22T22:27:17.758Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2025-10-22T14:26:13.802Z + 2025-10-22T22:27:17.555Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-10-22T14:26:14.429Z + 2025-10-22T22:27:18.109Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2025-10-22T14:26:13.855Z + 2025-10-22T22:27:17.602Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-10-22T14:26:13.586Z + 2025-10-22T22:27:17.365Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-10-22T14:26:14.288Z + 2025-10-22T22:27:17.985Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-10-22T14:26:14.932Z + 2025-10-22T22:27:18.551Z https://docs.axolotl.ai/docs/api/utils.quantization.html - 2025-10-22T14:26:14.579Z + 2025-10-22T22:27:18.241Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-10-22T14:26:14.536Z + 2025-10-22T22:27:18.203Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-10-22T14:26:14.274Z + 2025-10-22T22:27:17.973Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-10-22T14:26:14.611Z + 2025-10-22T22:27:18.271Z https://docs.axolotl.ai/docs/api/train.html - 2025-10-22T14:26:13.281Z + 2025-10-22T22:27:17.096Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-10-22T14:26:13.466Z + 2025-10-22T22:27:17.260Z https://docs.axolotl.ai/docs/inference.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/FAQS.html - 2025-10-22T14:22:30.550Z + 2025-10-22T22:23:34.137Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-10-22T14:22:30.560Z + 2025-10-22T22:23:34.145Z https://docs.axolotl.ai/index.html - 2025-10-22T14:22:30.573Z + 2025-10-22T22:23:34.155Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.138Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-10-22T14:26:14.688Z + 2025-10-22T22:27:18.338Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-10-22T14:26:14.253Z + 2025-10-22T22:27:17.954Z https://docs.axolotl.ai/docs/api/core.builders.causal.html - 2025-10-22T14:26:13.395Z + 2025-10-22T22:27:17.197Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2025-10-22T14:26:13.763Z + 2025-10-22T22:27:17.523Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-10-22T14:26:14.135Z + 2025-10-22T22:27:17.851Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-10-22T14:26:13.456Z + 2025-10-22T22:27:17.251Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-10-22T14:26:14.942Z + 2025-10-22T22:27:18.560Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-10-22T14:26:13.997Z + 2025-10-22T22:27:17.728Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-10-22T14:26:14.887Z + 2025-10-22T22:27:18.511Z https://docs.axolotl.ai/docs/api/cli.quantize.html - 2025-10-22T14:26:13.650Z + 2025-10-22T22:27:17.422Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-10-22T14:26:14.470Z + 2025-10-22T22:27:18.145Z https://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html - 2025-10-22T14:26:13.592Z + 2025-10-22T22:27:17.370Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-10-22T14:26:13.294Z + 2025-10-22T22:27:17.108Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-10-22T14:26:14.287Z + 2025-10-22T22:27:17.983Z https://docs.axolotl.ai/docs/api/loaders.model.html - 2025-10-22T14:26:13.815Z + 2025-10-22T22:27:17.567Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-10-22T14:26:14.529Z + 2025-10-22T22:27:18.197Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-10-22T14:26:14.436Z + 2025-10-22T22:27:18.115Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-10-22T14:26:14.240Z + 2025-10-22T22:27:17.942Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-10-22T14:26:13.507Z + 2025-10-22T22:27:17.296Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-10-22T14:26:14.883Z + 2025-10-22T22:27:18.508Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-10-22T14:26:14.545Z + 2025-10-22T22:27:18.211Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-10-22T14:26:13.670Z + 2025-10-22T22:27:17.440Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-10-22T14:26:14.283Z + 2025-10-22T22:27:17.980Z https://docs.axolotl.ai/docs/api/core.builders.base.html - 2025-10-22T14:26:13.389Z + 2025-10-22T22:27:17.192Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-10-22T14:26:14.649Z + 2025-10-22T22:27:18.303Z https://docs.axolotl.ai/docs/api/cli.utils.args.html - 2025-10-22T14:26:13.686Z + 2025-10-22T22:27:17.454Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-10-22T14:26:13.739Z + 2025-10-22T22:27:17.500Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-10-22T14:26:14.343Z + 2025-10-22T22:27:18.033Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-10-22T14:26:14.285Z + 2025-10-22T22:27:17.982Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-10-22T14:26:14.603Z + 2025-10-22T22:27:18.263Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-10-22T14:26:14.103Z + 2025-10-22T22:27:17.823Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-10-22T14:26:15.010Z + 2025-10-22T22:27:18.620Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-10-22T14:26:14.906Z + 2025-10-22T22:27:18.527Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-10-22T14:26:14.633Z + 2025-10-22T22:27:18.290Z https://docs.axolotl.ai/docs/api/cli.utils.fetch.html - 2025-10-22T14:26:13.693Z + 2025-10-22T22:27:17.460Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-10-22T14:26:14.077Z + 2025-10-22T22:27:17.800Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-10-22T14:26:14.293Z + 2025-10-22T22:27:17.989Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-10-22T14:26:13.528Z + 2025-10-22T22:27:17.314Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-10-22T14:26:14.079Z + 2025-10-22T22:27:17.801Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2025-10-22T14:26:13.804Z + 2025-10-22T22:27:17.556Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-10-22T14:26:13.971Z + 2025-10-22T22:27:17.705Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-10-22T14:26:14.423Z + 2025-10-22T22:27:18.103Z https://docs.axolotl.ai/docs/api/utils.data.streaming.html - 2025-10-22T14:26:14.547Z + 2025-10-22T22:27:18.213Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-10-22T14:26:14.440Z + 2025-10-22T22:27:18.119Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-10-22T14:26:14.885Z + 2025-10-22T22:27:18.510Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-10-22T14:26:13.564Z + 2025-10-22T22:27:17.346Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-10-22T14:26:13.774Z + 2025-10-22T22:27:17.530Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-10-22T14:26:14.856Z + 2025-10-22T22:27:18.484Z https://docs.axolotl.ai/docs/api/cli.utils.train.html - 2025-10-22T14:26:13.721Z + 2025-10-22T22:27:17.485Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-10-22T14:26:14.936Z + 2025-10-22T22:27:18.555Z https://docs.axolotl.ai/docs/api/cli.art.html - 2025-10-22T14:26:13.556Z + 2025-10-22T22:27:17.339Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-10-22T14:26:14.354Z + 2025-10-22T22:27:18.042Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-10-22T14:26:13.382Z + 2025-10-22T22:27:17.185Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-10-22T14:26:14.450Z + 2025-10-22T22:27:18.127Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-10-22T14:26:14.026Z + 2025-10-22T22:27:17.754Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-10-22T14:26:13.955Z + 2025-10-22T22:27:17.691Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-10-22T14:26:14.350Z + 2025-10-22T22:27:18.039Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-10-22T14:26:14.907Z + 2025-10-22T22:27:18.529Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-10-22T14:26:13.757Z + 2025-10-22T22:27:17.517Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-10-22T14:26:14.012Z + 2025-10-22T22:27:17.741Z https://docs.axolotl.ai/docs/api/loaders.constants.html - 2025-10-22T14:26:13.848Z + 2025-10-22T22:27:17.596Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-10-22T14:26:14.375Z + 2025-10-22T22:27:18.061Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-10-22T14:26:13.659Z + 2025-10-22T22:27:17.429Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-10-22T14:26:13.370Z + 2025-10-22T22:27:17.175Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-10-22T14:26:13.552Z + 2025-10-22T22:27:17.335Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-10-22T14:26:13.610Z + 2025-10-22T22:27:17.385Z https://docs.axolotl.ai/docs/api/cli.utils.load.html - 2025-10-22T14:26:13.700Z + 2025-10-22T22:27:17.466Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-10-22T14:26:13.644Z + 2025-10-22T22:27:17.417Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-10-22T14:26:15.004Z + 2025-10-22T22:27:18.614Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-10-22T14:26:14.999Z + 2025-10-22T22:27:18.610Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-10-22T14:26:13.447Z + 2025-10-22T22:27:17.243Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-10-22T14:26:14.862Z + 2025-10-22T22:27:18.489Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-10-22T14:26:14.871Z + 2025-10-22T22:27:18.497Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-10-22T14:26:14.364Z + 2025-10-22T22:27:18.051Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-10-22T14:26:13.448Z + 2025-10-22T22:27:17.244Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/quantize.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/nd_parallelism.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.138Z https://docs.axolotl.ai/docs/multi-node.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/rlhf.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/input_output.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/docker.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/gradient_checkpointing.html - 2025-10-22T14:22:30.553Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/optimizations.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-10-22T14:22:30.552Z + 2025-10-22T22:23:34.139Z https://docs.axolotl.ai/docs/installation.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/mixed_precision.html - 2025-10-22T14:22:30.555Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/docs/unsloth.html - 2025-10-22T14:22:30.556Z + 2025-10-22T22:23:34.142Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-10-22T14:22:30.577Z + 2025-10-22T22:23:34.158Z