diff --git a/.nojekyll b/.nojekyll index c2a64ff12..f6c1f0ea6 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -17703de0 \ No newline at end of file +d0072613 \ No newline at end of file diff --git a/docs/custom_integrations.html b/docs/custom_integrations.html index c7f7b587c..9fdb9fa8a 100644 --- a/docs/custom_integrations.html +++ b/docs/custom_integrations.html @@ -823,6 +823,11 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
  • How It Works
  • ScatterMoE
  • SonicMoE
  • +
  • Model Support Matrix
  • +
  • Routing strategies
  • +
  • Per-model support
  • +
  • Feature comparison
  • +
  • Shared Expert Handling
  • Limitations
  • Note on MegaBlocks
  • @@ -850,7 +855,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});
  • Liger Kernels
  • NeMo Gym Integration for Axolotl @@ -1324,27 +1329,349 @@ The quick brown fox jumps over the loud dog

    ScatterMoE

    1. Registers the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).
    2. -
    3. Patches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation.
    4. +
    5. Patches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation via the HF kernels library.

    SonicMoE

    1. Resolves the model’s MoE block class(es) from constants.py.
    2. -
    3. Patches the forward method with SonicMoE’s optimized kernels and registers a weight converter for the interleaved gate/up projection format.
    4. -
    5. Supports both softmax->topk and sigmoid->topk routing strategies.
    6. +
    7. Patches the forward method with SonicMoE’s optimized CUTLASS kernels and registers a weight converter for the interleaved gate/up projection format.
    8. +
    9. Supports pluggable routing strategies (see routing table below).

    Both paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.

    -
    -

    Supported Models

    -

    See constants.py for the full list of supported model types (Qwen2-MoE, Qwen3-MoE, OLMoE, Mixtral, DeepSeek-V3, GLM-MoE, MiniMax, etc.).

    +
    +

    Model Support Matrix

    +

    All models use the SwiGLU activation (act_fn(gate) * up). Neither kernel currently supports non-SwiGLU MoE architectures.

    +
    +
    +

    Routing strategies

    + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Routing StrategyDescriptionScatterMoESonicMoE
    softmax → topkSoftmax over experts, select top-K, optional renormalizationYesYes
    softmax → group selection → topkSoftmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scalingNoYes
    sigmoid → topk (with groups)Sigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoidYesYes
    sigmoid → topk (no groups)Sigmoid + bias correction, straight topk (n_group=1)YesYes
    softmax → bias correction → topkSoftmax, bias via gate.moe_statics, topk, gather from original probs, clamp-based renormNoYes
    softmax → group_limited_greedySoftmax, group selection (max per group), topk, scale only (no renorm)NoYes
    softmax → topk via gate.wgSoftmax, gate weight at gate.wg.weight (not gate.weight), always renormalizeNoYes
    fused topk → softmaxRouting + expert computation fused in a single kernelNoPlanned
    +
    +
    +

    Per-model support

    + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Model TypeArchitectureRoutingScatterMoESonicMoE
    qwen2_moeQwen2-MoEsoftmax → topkYesYes
    qwen3_moeQwen3-MoEsoftmax → topkYesYes
    qwen3_5_moeQwen3.5-MoEsoftmax → topkYesYes
    qwen3_5_moe_textQwen3.5-MoE (VLM text)softmax → topkYesYes
    qwen3_nextQwen3-Nextsoftmax → topkYesYes
    qwen3_vl_moeQwen3-VL-MoEsoftmax → topkYesYes
    qwen3_omni_moeQwen3-Omni (Thinker + Talker)softmax → topkYesYes
    olmoeOLMoEsoftmax → topkYesYes
    mixtralMixtralsoftmax → topkYesYes
    minimaxMiniMaxsoftmax → topkYesYes
    mistral4Mistral 4softmax → group → topkNoYes
    glm_moe_dsaGLM-MoE DSA (GLM 5)sigmoid → topk (groups)YesYes
    deepseek_v3DeepSeek-V3sigmoid → topk (groups)YesYes
    glm4_moeGLM4-MoEsigmoid → topk (groups)YesYes
    glm4_moe_liteGLM4-MoE Lite (GLM 4.7 Flash)sigmoid → topk (groups)Yes*Yes
    glm4v_moeGLM4v-MoEsigmoid → topk (groups)YesYes
    minimax_m2MiniMax M2sigmoid → topk (no groups)YesYes
    ernie4_5_moeERNIE 4.5 MoEsoftmax → bias → topkNoYes
    deepseek_v2DeepSeek-V2softmax → group_limited_greedyNoYes
    hunyuan_v1_moeHunYuan V1 MoEsoftmax → topk (gate.wg)NoYes
    gpt_ossGPT-OSSfused topk → softmaxNoPlanned
    +

    * glm4_moe_lite with ScatterMoE may have issues — see Limitations.

    +
    +
    +

    Feature comparison

    + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    FeatureScatterMoESonicMoE
    Kernel backendTritonCUTLASS
    GPU requirementAny CUDAHopper (H100/H200) or Blackwell (B200+)
    LoRA approachFused in Triton kernelRuntime materialization + custom autograd
    LoRA overheadLower (fused computation)Higher (per-forward materialization)
    Gate/router LoRAYesYes
    Expert LoRAYes (fused)Yes (materialized)
    Shared expert LoRAYes (standard PEFT)Yes (standard PEFT)
    Selective expert dequantizationYes (~97% memory savings)No
    Weight formatTransposed [E, hidden, 2*inter]Interleaved gate/up [2*I, H, E]
    torch.compile routingNoYes (optional)
    +
    +
    +

    Shared Expert Handling

    +

    Both kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:

    +
      +
    1. shared_expert (Qwen2-MoE)
    2. +
    3. shared_experts (GLM-MoE, DeepSeek-V3)
    4. +
    5. shared_mlp (HunYuan V1 MoE)
    6. +
    +

    If shared_expert_gate exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.

    Limitations

    -

    ScatterMoE uses a softmax -> topk routing, so results may be different for some model architectures as baseline (GPT-OSS, etc). Incompatible with GLM_MOE_DSA (GLM 5) and GLM4_MOE_LITE (GLM 4.7 Flash) at the moment.

    -

    SonicMoE supports both softmax->topk and sigmoid->topk routing, covering a wider range of architectures.

    -

    ScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm.

    +
      +
    • ScatterMoE + GLM4-MoE Lite: ScatterMoE does not work reliably for GLM 4.7 Flash (glm4_moe_lite).
    • +
    • Non-SwiGLU activations: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).
    • +
    • GPT-OSS: Deferred — requires transposed weight layout [E, H, 2*I], expert biases, and custom GLU activation. A dedicated forward path is needed.
    • +
    • FSDP + fused gate LoRA (SonicMoE): The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.
    • +

    Note on MegaBlocks

    @@ -1552,8 +1879,8 @@ sparse model before inference for even greater performance benefits.:

    liger_use_token_scaling: true
    -
    -

    Supported Models

    +
    +

    Supported Models

    • deepseek_v2
    • gemma
    • diff --git a/search.json b/search.json index 3d5d5fa91..fe2654612 100644 --- a/search.json +++ b/search.json @@ -3762,7 +3762,7 @@ "href": "docs/custom_integrations.html#kernels-integration", "title": "Custom Integrations", "section": "Kernels Integration", - "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n _global_mapping = {\n \"batched_mm\": batched_mm_experts_forward,\n \"grouped_mm\": grouped_mm_experts_forward,\n }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation is incompatible with custom kernel options.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports both softmax->topk and sigmoid->topk routing strategies.\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\nSupported Models\nSee constants.py for the full list of supported model types (Qwen2-MoE, Qwen3-MoE, OLMoE, Mixtral, DeepSeek-V3, GLM-MoE, MiniMax, etc.).\n\n\n\nLimitations\nScatterMoE uses a softmax -> topk routing, so results may be different for some model architectures as baseline (GPT-OSS, etc). Incompatible with GLM_MOE_DSA (GLM 5) and GLM4_MOE_LITE (GLM 4.7 Flash) at the moment.\nSonicMoE supports both softmax->topk and sigmoid->topk routing, covering a wider range of architectures.\nScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm.\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here", + "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n _global_mapping = {\n \"batched_mm\": batched_mm_experts_forward,\n \"grouped_mm\": grouped_mm_experts_forward,\n }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation is incompatible with custom kernel options.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation via the HF kernels library.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized CUTLASS kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports pluggable routing strategies (see routing table below).\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\n\nModel Support Matrix\nAll models use the SwiGLU activation (act_fn(gate) * up). Neither kernel currently supports non-SwiGLU MoE architectures.\n\n\nRouting strategies\n\n\n\n\n\n\n\n\n\nRouting Strategy\nDescription\nScatterMoE\nSonicMoE\n\n\n\n\nsoftmax → topk\nSoftmax over experts, select top-K, optional renormalization\nYes\nYes\n\n\nsoftmax → group selection → topk\nSoftmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scaling\nNo\nYes\n\n\nsigmoid → topk (with groups)\nSigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoid\nYes\nYes\n\n\nsigmoid → topk (no groups)\nSigmoid + bias correction, straight topk (n_group=1)\nYes\nYes\n\n\nsoftmax → bias correction → topk\nSoftmax, bias via gate.moe_statics, topk, gather from original probs, clamp-based renorm\nNo\nYes\n\n\nsoftmax → group_limited_greedy\nSoftmax, group selection (max per group), topk, scale only (no renorm)\nNo\nYes\n\n\nsoftmax → topk via gate.wg\nSoftmax, gate weight at gate.wg.weight (not gate.weight), always renormalize\nNo\nYes\n\n\nfused topk → softmax\nRouting + expert computation fused in a single kernel\nNo\nPlanned\n\n\n\n\n\nPer-model support\n\n\n\n\n\n\n\n\n\n\nModel Type\nArchitecture\nRouting\nScatterMoE\nSonicMoE\n\n\n\n\nqwen2_moe\nQwen2-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_moe\nQwen3-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe\nQwen3.5-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe_text\nQwen3.5-MoE (VLM text)\nsoftmax → topk\nYes\nYes\n\n\nqwen3_next\nQwen3-Next\nsoftmax → topk\nYes\nYes\n\n\nqwen3_vl_moe\nQwen3-VL-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_omni_moe\nQwen3-Omni (Thinker + Talker)\nsoftmax → topk\nYes\nYes\n\n\nolmoe\nOLMoE\nsoftmax → topk\nYes\nYes\n\n\nmixtral\nMixtral\nsoftmax → topk\nYes\nYes\n\n\nminimax\nMiniMax\nsoftmax → topk\nYes\nYes\n\n\nmistral4\nMistral 4\nsoftmax → group → topk\nNo\nYes\n\n\nglm_moe_dsa\nGLM-MoE DSA (GLM 5)\nsigmoid → topk (groups)\nYes\nYes\n\n\ndeepseek_v3\nDeepSeek-V3\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe\nGLM4-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe_lite\nGLM4-MoE Lite (GLM 4.7 Flash)\nsigmoid → topk (groups)\nYes*\nYes\n\n\nglm4v_moe\nGLM4v-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nminimax_m2\nMiniMax M2\nsigmoid → topk (no groups)\nYes\nYes\n\n\nernie4_5_moe\nERNIE 4.5 MoE\nsoftmax → bias → topk\nNo\nYes\n\n\ndeepseek_v2\nDeepSeek-V2\nsoftmax → group_limited_greedy\nNo\nYes\n\n\nhunyuan_v1_moe\nHunYuan V1 MoE\nsoftmax → topk (gate.wg)\nNo\nYes\n\n\ngpt_oss\nGPT-OSS\nfused topk → softmax\nNo\nPlanned\n\n\n\n* glm4_moe_lite with ScatterMoE may have issues — see Limitations.\n\n\nFeature comparison\n\n\n\n\n\n\n\n\nFeature\nScatterMoE\nSonicMoE\n\n\n\n\nKernel backend\nTriton\nCUTLASS\n\n\nGPU requirement\nAny CUDA\nHopper (H100/H200) or Blackwell (B200+)\n\n\nLoRA approach\nFused in Triton kernel\nRuntime materialization + custom autograd\n\n\nLoRA overhead\nLower (fused computation)\nHigher (per-forward materialization)\n\n\nGate/router LoRA\nYes\nYes\n\n\nExpert LoRA\nYes (fused)\nYes (materialized)\n\n\nShared expert LoRA\nYes (standard PEFT)\nYes (standard PEFT)\n\n\nSelective expert dequantization\nYes (~97% memory savings)\nNo\n\n\nWeight format\nTransposed [E, hidden, 2*inter]\nInterleaved gate/up [2*I, H, E]\n\n\ntorch.compile routing\nNo\nYes (optional)\n\n\n\n\n\nShared Expert Handling\nBoth kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:\n\nshared_expert (Qwen2-MoE)\nshared_experts (GLM-MoE, DeepSeek-V3)\nshared_mlp (HunYuan V1 MoE)\n\nIf shared_expert_gate exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.\n\n\nLimitations\n\nScatterMoE + GLM4-MoE Lite: ScatterMoE does not work reliably for GLM 4.7 Flash (glm4_moe_lite).\nNon-SwiGLU activations: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).\nGPT-OSS: Deferred — requires transposed weight layout [E, H, 2*I], expert biases, and custom GLU activation. A dedicated forward path is needed.\nFSDP + fused gate LoRA (SonicMoE): The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.\n\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here", "crumbs": [ "Advanced Features", "Custom Integrations" diff --git a/sitemap.xml b/sitemap.xml index 9bf11c8e2..1cfb9e5c6 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,982 +2,982 @@ https://docs.axolotl.ai/FAQS.html - 2026-04-02T12:02:09.344Z + 2026-04-02T14:18:48.298Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/api/cli.args.html - 2026-04-02T12:05:41.667Z + 2026-04-02T14:22:11.891Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2026-04-02T12:05:42.175Z + 2026-04-02T14:22:12.392Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2026-04-02T12:05:41.762Z + 2026-04-02T14:22:11.984Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2026-04-02T12:05:43.042Z + 2026-04-02T14:22:13.255Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2026-04-02T12:05:42.210Z + 2026-04-02T14:22:12.427Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2026-04-02T12:05:42.795Z + 2026-04-02T14:22:13.009Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2026-04-02T12:05:42.522Z + 2026-04-02T14:22:12.738Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2026-04-02T12:05:43.038Z + 2026-04-02T14:22:13.252Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2026-04-02T12:05:42.389Z + 2026-04-02T14:22:12.605Z https://docs.axolotl.ai/docs/api/core.builders.base.html - 2026-04-02T12:05:41.500Z + 2026-04-02T14:22:11.726Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2026-04-02T12:05:42.156Z + 2026-04-02T14:22:12.372Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2026-04-02T12:05:43.012Z + 2026-04-02T14:22:13.225Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2026-04-02T12:05:41.726Z + 2026-04-02T14:22:11.949Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html - 2026-04-02T12:05:42.505Z + 2026-04-02T14:22:12.721Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2026-04-02T12:05:41.570Z + 2026-04-02T14:22:11.795Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2026-04-02T12:05:41.563Z + 2026-04-02T14:22:11.789Z https://docs.axolotl.ai/docs/api/logging_config.html - 2026-04-02T12:05:41.492Z + 2026-04-02T14:22:11.718Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2026-04-02T12:05:42.080Z + 2026-04-02T14:22:12.297Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2026-04-02T12:05:43.069Z + 2026-04-02T14:22:13.282Z https://docs.axolotl.ai/docs/api/cli.config.html - 2026-04-02T12:05:41.702Z + 2026-04-02T14:22:11.925Z https://docs.axolotl.ai/docs/api/loaders.model.html - 2026-04-02T12:05:41.944Z + 2026-04-02T14:22:12.161Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2026-04-02T12:05:42.249Z + 2026-04-02T14:22:12.466Z https://docs.axolotl.ai/docs/api/cli.quantize.html - 2026-04-02T12:05:41.768Z + 2026-04-02T14:22:11.990Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2026-04-02T12:05:42.282Z + 2026-04-02T14:22:12.499Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2026-04-02T12:05:43.016Z + 2026-04-02T14:22:13.230Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2026-04-02T12:05:42.188Z + 2026-04-02T14:22:12.405Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2026-04-02T12:05:43.134Z + 2026-04-02T14:22:13.346Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2026-04-02T12:05:42.424Z + 2026-04-02T14:22:12.640Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2026-04-02T12:05:42.466Z + 2026-04-02T14:22:12.682Z https://docs.axolotl.ai/docs/api/loaders.patch_manager.html - 2026-04-02T12:05:41.986Z + 2026-04-02T14:22:12.203Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2026-04-02T12:05:42.529Z + 2026-04-02T14:22:12.745Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2026-04-02T12:05:42.764Z + 2026-04-02T14:22:12.978Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2026-04-02T12:05:43.139Z + 2026-04-02T14:22:13.351Z https://docs.axolotl.ai/docs/api/convert.html - 2026-04-02T12:05:41.427Z + 2026-04-02T14:22:11.653Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2026-04-02T12:05:41.791Z + 2026-04-02T14:22:12.012Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2026-04-02T12:05:42.329Z + 2026-04-02T14:22:12.546Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2026-04-02T12:05:42.432Z + 2026-04-02T14:22:12.647Z https://docs.axolotl.ai/docs/api/common.const.html - 2026-04-02T12:05:43.020Z + 2026-04-02T14:22:13.233Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2026-04-02T12:05:42.543Z + 2026-04-02T14:22:12.759Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2026-04-02T12:05:42.802Z + 2026-04-02T14:22:13.016Z https://docs.axolotl.ai/docs/api/utils.callbacks.qat.html - 2026-04-02T12:05:43.158Z + 2026-04-02T14:22:13.370Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2026-04-02T12:05:42.655Z + 2026-04-02T14:22:12.871Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2026-04-02T12:05:42.381Z + 2026-04-02T14:22:12.597Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html - 2026-04-02T12:05:41.930Z + 2026-04-02T14:22:12.148Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2026-04-02T12:05:41.558Z + 2026-04-02T14:22:11.783Z https://docs.axolotl.ai/docs/api/core.trainers.mamba.html - 2026-04-02T12:05:41.886Z + 2026-04-02T14:22:12.107Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2026-04-02T12:05:42.229Z + 2026-04-02T14:22:12.445Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2026-04-02T12:05:42.355Z + 2026-04-02T14:22:12.571Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2026-04-02T12:05:42.183Z + 2026-04-02T14:22:12.400Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2026-04-02T12:05:42.753Z + 2026-04-02T14:22:12.968Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2026-04-02T12:05:42.757Z + 2026-04-02T14:22:12.972Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2026-04-02T12:05:42.148Z + 2026-04-02T14:22:12.365Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2026-04-02T12:05:41.777Z + 2026-04-02T14:22:11.998Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2026-04-02T12:05:42.565Z + 2026-04-02T14:22:12.781Z https://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html - 2026-04-02T12:05:42.037Z + 2026-04-02T14:22:12.254Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2026-04-02T12:05:41.528Z + 2026-04-02T14:22:11.754Z https://docs.axolotl.ai/docs/api/evaluate.html - 2026-04-02T12:05:41.402Z + 2026-04-02T14:22:11.629Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2026-04-02T12:05:43.150Z + 2026-04-02T14:22:13.362Z https://docs.axolotl.ai/docs/api/loaders.tokenizer.html - 2026-04-02T12:05:41.955Z + 2026-04-02T14:22:12.172Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2026-04-02T12:05:42.379Z + 2026-04-02T14:22:12.595Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2026-04-02T12:05:41.789Z + 2026-04-02T14:22:12.011Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2026-04-02T12:05:42.161Z + 2026-04-02T14:22:12.378Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2026-04-02T12:05:42.433Z + 2026-04-02T14:22:12.649Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2026-04-02T12:05:41.561Z + 2026-04-02T14:22:11.787Z https://docs.axolotl.ai/docs/api/utils.quantization.html - 2026-04-02T12:05:42.680Z + 2026-04-02T14:22:12.895Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2026-04-02T12:05:42.455Z + 2026-04-02T14:22:12.670Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2026-04-02T12:05:42.277Z + 2026-04-02T14:22:12.494Z https://docs.axolotl.ai/docs/api/cli.art.html - 2026-04-02T12:05:41.671Z + 2026-04-02T14:22:11.895Z https://docs.axolotl.ai/docs/api/loaders.processor.html - 2026-04-02T12:05:41.957Z + 2026-04-02T14:22:12.174Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2026-04-02T12:05:41.751Z + 2026-04-02T14:22:11.973Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2026-04-02T12:05:42.370Z + 2026-04-02T14:22:12.586Z https://docs.axolotl.ai/docs/api/core.trainers.utils.html - 2026-04-02T12:05:41.932Z + 2026-04-02T14:22:12.150Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2026-04-02T12:05:42.197Z + 2026-04-02T14:22:12.413Z https://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html - 2026-04-02T12:05:41.708Z + 2026-04-02T14:22:11.931Z https://docs.axolotl.ai/docs/faq.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/expert_quantization.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/checkpoint_saving.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/agents/pretraining.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/agents/grpo.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/agents/sft.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/multi-gpu.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/nd_parallelism.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/mac.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/reward_modelling.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/models/ministral3.html - 2026-04-02T12:06:06.437Z + 2026-04-02T14:22:37.196Z https://docs.axolotl.ai/docs/models/hunyuan.html - 2026-04-02T12:06:06.445Z + 2026-04-02T14:22:37.207Z https://docs.axolotl.ai/docs/models/smolvlm2.html - 2026-04-02T12:06:06.444Z + 2026-04-02T14:22:37.205Z https://docs.axolotl.ai/docs/models/ministral3/vision.html - 2026-04-02T12:06:06.437Z + 2026-04-02T14:22:37.197Z https://docs.axolotl.ai/docs/models/voxtral.html - 2026-04-02T12:06:06.440Z + 2026-04-02T14:22:37.200Z https://docs.axolotl.ai/docs/models/ministral.html - 2026-04-02T12:06:06.439Z + 2026-04-02T14:22:37.199Z https://docs.axolotl.ai/docs/models/granite4.html - 2026-04-02T12:06:06.444Z + 2026-04-02T14:22:37.206Z https://docs.axolotl.ai/docs/models/phi.html - 2026-04-02T12:06:06.443Z + 2026-04-02T14:22:37.205Z https://docs.axolotl.ai/docs/models/internvl3_5.html - 2026-04-02T12:06:06.435Z + 2026-04-02T14:22:37.194Z https://docs.axolotl.ai/docs/models/magistral/think.html - 2026-04-02T12:06:06.438Z + 2026-04-02T14:22:37.198Z https://docs.axolotl.ai/docs/models/mistral-small.html - 2026-04-02T12:06:06.439Z + 2026-04-02T14:22:37.199Z https://docs.axolotl.ai/docs/models/gemma3n.html - 2026-04-02T12:06:06.442Z + 2026-04-02T14:22:37.203Z https://docs.axolotl.ai/docs/models/arcee.html - 2026-04-02T12:06:06.436Z + 2026-04-02T14:22:37.195Z https://docs.axolotl.ai/docs/models/llama-2.html - 2026-04-02T12:06:06.441Z + 2026-04-02T14:22:37.202Z https://docs.axolotl.ai/docs/models/llama-4.html - 2026-04-02T12:06:06.441Z + 2026-04-02T14:22:37.201Z https://docs.axolotl.ai/docs/models/seed-oss.html - 2026-04-02T12:06:06.443Z + 2026-04-02T14:22:37.205Z https://docs.axolotl.ai/docs/models/jamba.html - 2026-04-02T12:06:06.445Z + 2026-04-02T14:22:37.207Z https://docs.axolotl.ai/docs/nccl.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/multipack.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/debugging.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/vllm_serving.html - 2026-04-02T12:02:09.352Z + 2026-04-02T14:18:48.305Z https://docs.axolotl.ai/docs/optimizers.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/ebft.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/torchao.html - 2026-04-02T12:02:09.352Z + 2026-04-02T14:18:48.305Z https://docs.axolotl.ai/docs/lr_groups.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/streaming.html - 2026-04-02T12:02:09.352Z + 2026-04-02T14:18:48.305Z https://docs.axolotl.ai/docs/amd_hpc.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/installation.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.303Z https://docs.axolotl.ai/docs/inference.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.303Z https://docs.axolotl.ai/docs/getting-started.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/telemetry.html - 2026-04-02T12:02:09.352Z + 2026-04-02T14:18:48.305Z https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2026-04-02T12:02:09.391Z + 2026-04-02T14:18:48.333Z https://docs.axolotl.ai/index.html - 2026-04-02T12:02:09.380Z + 2026-04-02T14:18:48.326Z https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2026-04-02T12:02:09.359Z + 2026-04-02T14:18:48.309Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2026-04-02T12:02:09.389Z + 2026-04-02T14:18:48.333Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2026-04-02T12:02:09.352Z + 2026-04-02T14:18:48.305Z https://docs.axolotl.ai/docs/quantize.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/docker.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/attention.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/unsloth.html - 2026-04-02T12:02:09.352Z + 2026-04-02T14:18:48.305Z https://docs.axolotl.ai/docs/qat.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/multi-node.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/custom_integrations.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/ray-integration.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/config-reference.html - 2026-04-02T12:06:05.346Z + 2026-04-02T14:22:36.307Z https://docs.axolotl.ai/docs/gradient_checkpointing.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/grpo.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/choosing_method.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/models/LiquidAI.html - 2026-04-02T12:06:06.444Z + 2026-04-02T14:22:37.206Z https://docs.axolotl.ai/docs/models/magistral.html - 2026-04-02T12:06:06.438Z + 2026-04-02T14:22:37.198Z https://docs.axolotl.ai/docs/models/devstral.html - 2026-04-02T12:06:06.440Z + 2026-04-02T14:22:37.200Z https://docs.axolotl.ai/docs/models/qwen3-next.html - 2026-04-02T12:06:06.441Z + 2026-04-02T14:22:37.202Z https://docs.axolotl.ai/docs/models/mistral.html - 2026-04-02T12:06:06.440Z + 2026-04-02T14:22:37.201Z https://docs.axolotl.ai/docs/models/plano.html - 2026-04-02T12:06:06.434Z + 2026-04-02T14:22:37.193Z https://docs.axolotl.ai/docs/models/olmo3.html - 2026-04-02T12:06:06.435Z + 2026-04-02T14:22:37.194Z https://docs.axolotl.ai/docs/models/magistral/vision.html - 2026-04-02T12:06:06.439Z + 2026-04-02T14:22:37.199Z https://docs.axolotl.ai/docs/models/mimo.html - 2026-04-02T12:06:06.434Z + 2026-04-02T14:22:37.193Z https://docs.axolotl.ai/docs/models/index.html - 2026-04-02T12:06:06.446Z + 2026-04-02T14:22:37.208Z https://docs.axolotl.ai/docs/models/trinity.html - 2026-04-02T12:06:06.435Z + 2026-04-02T14:22:37.195Z https://docs.axolotl.ai/docs/models/kimi-linear.html - 2026-04-02T12:06:06.434Z + 2026-04-02T14:22:37.192Z https://docs.axolotl.ai/docs/models/orpheus.html - 2026-04-02T12:06:06.446Z + 2026-04-02T14:22:37.208Z https://docs.axolotl.ai/docs/models/qwen3.html - 2026-04-02T12:06:06.442Z + 2026-04-02T14:22:37.203Z https://docs.axolotl.ai/docs/models/ministral3/think.html - 2026-04-02T12:06:06.437Z + 2026-04-02T14:22:37.196Z https://docs.axolotl.ai/docs/models/apertus.html - 2026-04-02T12:06:06.442Z + 2026-04-02T14:22:37.204Z https://docs.axolotl.ai/docs/models/gpt-oss.html - 2026-04-02T12:06:06.443Z + 2026-04-02T14:22:37.204Z https://docs.axolotl.ai/docs/mixed_precision.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/lora_optims.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/dataset_loading.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/input_output.html - 2026-04-02T12:02:09.350Z + 2026-04-02T14:18:48.303Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/agents/preference_tuning.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/agents/reward_modelling.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/optimizations.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/training_stability.html - 2026-04-02T12:02:09.352Z + 2026-04-02T14:18:48.305Z https://docs.axolotl.ai/docs/cli.html - 2026-04-02T12:02:09.347Z + 2026-04-02T14:18:48.300Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2026-04-02T12:05:43.145Z + 2026-04-02T14:22:13.357Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2026-04-02T12:05:43.040Z + 2026-04-02T14:22:13.253Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2026-04-02T12:05:41.895Z + 2026-04-02T14:22:12.115Z https://docs.axolotl.ai/docs/api/cli.utils.fetch.html - 2026-04-02T12:05:41.812Z + 2026-04-02T14:22:12.033Z https://docs.axolotl.ai/docs/api/core.builders.causal.html - 2026-04-02T12:05:41.506Z + 2026-04-02T14:22:11.732Z https://docs.axolotl.ai/docs/api/core.builders.rl.html - 2026-04-02T12:05:41.512Z + 2026-04-02T14:22:11.738Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2026-04-02T12:05:42.533Z + 2026-04-02T14:22:12.749Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2026-04-02T12:05:42.251Z + 2026-04-02T14:22:12.468Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2026-04-02T12:05:42.100Z + 2026-04-02T14:22:12.316Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2026-04-02T12:05:42.098Z + 2026-04-02T14:22:12.314Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2026-04-02T12:05:43.075Z + 2026-04-02T14:22:13.288Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2026-04-02T12:05:42.604Z + 2026-04-02T14:22:12.819Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2026-04-02T12:05:42.372Z + 2026-04-02T14:22:12.588Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2026-04-02T12:05:41.560Z + 2026-04-02T14:22:11.785Z https://docs.axolotl.ai/docs/api/loaders.constants.html - 2026-04-02T12:05:41.988Z + 2026-04-02T14:22:12.205Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2026-04-02T12:05:42.708Z + 2026-04-02T14:22:12.923Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2026-04-02T12:05:42.994Z + 2026-04-02T14:22:13.207Z https://docs.axolotl.ai/docs/api/cli.utils.load.html - 2026-04-02T12:05:41.819Z + 2026-04-02T14:22:12.041Z https://docs.axolotl.ai/docs/api/loaders.adapter.html - 2026-04-02T12:05:41.964Z + 2026-04-02T14:22:12.181Z https://docs.axolotl.ai/docs/api/cli.train.html - 2026-04-02T12:05:41.632Z + 2026-04-02T14:22:11.856Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2026-04-02T12:05:42.441Z + 2026-04-02T14:22:12.656Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2026-04-02T12:05:41.680Z + 2026-04-02T14:22:11.903Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2026-04-02T12:05:42.227Z + 2026-04-02T14:22:12.443Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2026-04-02T12:05:42.141Z + 2026-04-02T14:22:12.357Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2026-04-02T12:05:41.879Z + 2026-04-02T14:22:12.100Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2026-04-02T12:05:42.383Z + 2026-04-02T14:22:12.598Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html - 2026-04-02T12:05:42.008Z + 2026-04-02T14:22:12.225Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2026-04-02T12:05:41.915Z + 2026-04-02T14:22:12.133Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2026-04-02T12:05:41.737Z + 2026-04-02T14:22:11.959Z https://docs.axolotl.ai/docs/api/datasets.html - 2026-04-02T12:05:41.409Z + 2026-04-02T14:22:11.637Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2026-04-02T12:05:42.717Z + 2026-04-02T14:22:12.932Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2026-04-02T12:05:42.629Z + 2026-04-02T14:22:12.845Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2026-04-02T12:05:41.781Z + 2026-04-02T14:22:12.003Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2026-04-02T12:05:42.342Z + 2026-04-02T14:22:12.559Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html - 2026-04-02T12:05:41.995Z + 2026-04-02T14:22:12.212Z https://docs.axolotl.ai/docs/api/index.html - 2026-04-02T12:05:41.309Z + 2026-04-02T14:22:11.538Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2026-04-02T12:05:42.039Z + 2026-04-02T14:22:12.256Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2026-04-02T12:05:41.642Z + 2026-04-02T14:22:11.866Z https://docs.axolotl.ai/docs/api/train.html - 2026-04-02T12:05:41.388Z + 2026-04-02T14:22:11.616Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2026-04-02T12:05:43.018Z + 2026-04-02T14:22:13.231Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2026-04-02T12:05:42.239Z + 2026-04-02T14:22:12.456Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2026-04-02T12:05:43.141Z + 2026-04-02T14:22:13.353Z https://docs.axolotl.ai/docs/api/cli.utils.train.html - 2026-04-02T12:05:41.842Z + 2026-04-02T14:22:12.063Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2026-04-02T12:05:43.007Z + 2026-04-02T14:22:13.221Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2026-04-02T12:05:41.480Z + 2026-04-02T14:22:11.706Z https://docs.axolotl.ai/docs/api/cli.utils.sweeps.html - 2026-04-02T12:05:41.827Z + 2026-04-02T14:22:12.048Z https://docs.axolotl.ai/docs/api/cli.utils.args.html - 2026-04-02T12:05:41.806Z + 2026-04-02T14:22:12.027Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2026-04-02T12:05:42.516Z + 2026-04-02T14:22:12.732Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2026-04-02T12:05:42.699Z + 2026-04-02T14:22:12.914Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2026-04-02T12:05:42.125Z + 2026-04-02T14:22:12.341Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2026-04-02T12:05:42.742Z + 2026-04-02T14:22:12.957Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2026-04-02T12:05:42.988Z + 2026-04-02T14:22:13.202Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2026-04-02T12:05:42.514Z + 2026-04-02T14:22:12.730Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2026-04-02T12:05:42.384Z + 2026-04-02T14:22:12.600Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2026-04-02T12:05:43.003Z + 2026-04-02T14:22:13.217Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2026-04-02T12:05:42.468Z + 2026-04-02T14:22:12.684Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2026-04-02T12:05:41.860Z + 2026-04-02T14:22:12.081Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2026-04-02T12:05:42.784Z + 2026-04-02T14:22:12.998Z https://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html - 2026-04-02T12:05:41.999Z + 2026-04-02T14:22:12.216Z https://docs.axolotl.ai/docs/api/cli.main.html - 2026-04-02T12:05:41.621Z + 2026-04-02T14:22:11.846Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2026-04-02T12:05:42.445Z + 2026-04-02T14:22:12.661Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2026-04-02T12:05:41.579Z + 2026-04-02T14:22:11.804Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2026-04-02T12:05:42.114Z + 2026-04-02T14:22:12.331Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2026-04-02T12:05:42.992Z + 2026-04-02T14:22:13.206Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2026-04-02T12:05:42.453Z + 2026-04-02T14:22:12.668Z https://docs.axolotl.ai/docs/api/utils.data.streaming.html - 2026-04-02T12:05:42.648Z + 2026-04-02T14:22:12.863Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2026-04-02T12:05:43.065Z + 2026-04-02T14:22:13.278Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2026-04-02T12:05:43.126Z + 2026-04-02T14:22:13.338Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2026-04-02T12:05:42.223Z + 2026-04-02T14:22:12.440Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2026-04-02T12:05:42.636Z + 2026-04-02T14:22:12.851Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2026-04-02T12:05:42.225Z + 2026-04-02T14:22:12.442Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2026-04-02T12:05:42.646Z + 2026-04-02T14:22:12.861Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2026-04-02T12:05:42.170Z + 2026-04-02T14:22:12.387Z https://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html - 2026-04-02T12:05:42.472Z + 2026-04-02T14:22:12.688Z https://docs.axolotl.ai/docs/rlhf.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2026-04-02T12:02:09.348Z + 2026-04-02T14:18:48.301Z https://docs.axolotl.ai/docs/multimodal.html - 2026-04-02T12:02:09.351Z + 2026-04-02T14:18:48.304Z