Softmax, bias via gate.moe_statics, topk, gather from original probs, clamp-based renorm
+
No
+
Yes
+
+
+
softmax → group_limited_greedy
+
Softmax, group selection (max per group), topk, scale only (no renorm)
+
No
+
Yes
+
+
+
softmax → topk via gate.wg
+
Softmax, gate weight at gate.wg.weight (not gate.weight), always renormalize
+
No
+
Yes
+
+
+
fused topk → softmax
+
Routing + expert computation fused in a single kernel
+
No
+
Planned
+
+
+
+
+
+
Per-model support
+
+
+
+
+
+
+
+
+
+
+
Model Type
+
Architecture
+
Routing
+
ScatterMoE
+
SonicMoE
+
+
+
+
+
qwen2_moe
+
Qwen2-MoE
+
softmax → topk
+
Yes
+
Yes
+
+
+
qwen3_moe
+
Qwen3-MoE
+
softmax → topk
+
Yes
+
Yes
+
+
+
qwen3_5_moe
+
Qwen3.5-MoE
+
softmax → topk
+
Yes
+
Yes
+
+
+
qwen3_5_moe_text
+
Qwen3.5-MoE (VLM text)
+
softmax → topk
+
Yes
+
Yes
+
+
+
qwen3_next
+
Qwen3-Next
+
softmax → topk
+
Yes
+
Yes
+
+
+
qwen3_vl_moe
+
Qwen3-VL-MoE
+
softmax → topk
+
Yes
+
Yes
+
+
+
qwen3_omni_moe
+
Qwen3-Omni (Thinker + Talker)
+
softmax → topk
+
Yes
+
Yes
+
+
+
olmoe
+
OLMoE
+
softmax → topk
+
Yes
+
Yes
+
+
+
mixtral
+
Mixtral
+
softmax → topk
+
Yes
+
Yes
+
+
+
minimax
+
MiniMax
+
softmax → topk
+
Yes
+
Yes
+
+
+
mistral4
+
Mistral 4
+
softmax → group → topk
+
No
+
Yes
+
+
+
glm_moe_dsa
+
GLM-MoE DSA (GLM 5)
+
sigmoid → topk (groups)
+
Yes
+
Yes
+
+
+
deepseek_v3
+
DeepSeek-V3
+
sigmoid → topk (groups)
+
Yes
+
Yes
+
+
+
glm4_moe
+
GLM4-MoE
+
sigmoid → topk (groups)
+
Yes
+
Yes
+
+
+
glm4_moe_lite
+
GLM4-MoE Lite (GLM 4.7 Flash)
+
sigmoid → topk (groups)
+
Yes*
+
Yes
+
+
+
glm4v_moe
+
GLM4v-MoE
+
sigmoid → topk (groups)
+
Yes
+
Yes
+
+
+
minimax_m2
+
MiniMax M2
+
sigmoid → topk (no groups)
+
Yes
+
Yes
+
+
+
ernie4_5_moe
+
ERNIE 4.5 MoE
+
softmax → bias → topk
+
No
+
Yes
+
+
+
deepseek_v2
+
DeepSeek-V2
+
softmax → group_limited_greedy
+
No
+
Yes
+
+
+
hunyuan_v1_moe
+
HunYuan V1 MoE
+
softmax → topk (gate.wg)
+
No
+
Yes
+
+
+
gpt_oss
+
GPT-OSS
+
fused topk → softmax
+
No
+
Planned
+
+
+
+
* glm4_moe_lite with ScatterMoE may have issues — see Limitations.
+
+
+
Feature comparison
+
+
+
+
+
+
+
+
+
Feature
+
ScatterMoE
+
SonicMoE
+
+
+
+
+
Kernel backend
+
Triton
+
CUTLASS
+
+
+
GPU requirement
+
Any CUDA
+
Hopper (H100/H200) or Blackwell (B200+)
+
+
+
LoRA approach
+
Fused in Triton kernel
+
Runtime materialization + custom autograd
+
+
+
LoRA overhead
+
Lower (fused computation)
+
Higher (per-forward materialization)
+
+
+
Gate/router LoRA
+
Yes
+
Yes
+
+
+
Expert LoRA
+
Yes (fused)
+
Yes (materialized)
+
+
+
Shared expert LoRA
+
Yes (standard PEFT)
+
Yes (standard PEFT)
+
+
+
Selective expert dequantization
+
Yes (~97% memory savings)
+
No
+
+
+
Weight format
+
Transposed [E, hidden, 2*inter]
+
Interleaved gate/up [2*I, H, E]
+
+
+
torch.compile routing
+
No
+
Yes (optional)
+
+
+
+
+
+
Shared Expert Handling
+
Both kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:
+
+
shared_expert (Qwen2-MoE)
+
shared_experts (GLM-MoE, DeepSeek-V3)
+
shared_mlp (HunYuan V1 MoE)
+
+
If shared_expert_gate exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.
Limitations
-
ScatterMoE uses a softmax -> topk routing, so results may be different for some model architectures as baseline (GPT-OSS, etc). Incompatible with GLM_MOE_DSA (GLM 5) and GLM4_MOE_LITE (GLM 4.7 Flash) at the moment.
-
SonicMoE supports both softmax->topk and sigmoid->topk routing, covering a wider range of architectures.
-
ScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm.
+
+
ScatterMoE + GLM4-MoE Lite: ScatterMoE does not work reliably for GLM 4.7 Flash (glm4_moe_lite).
+
Non-SwiGLU activations: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).
+
GPT-OSS: Deferred — requires transposed weight layout [E, H, 2*I], expert biases, and custom GLU activation. A dedicated forward path is needed.
+
FSDP + fused gate LoRA (SonicMoE): The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.
+
Note on MegaBlocks
@@ -1552,8 +1879,8 @@ sparse model before inference for even greater performance benefits.:
liger_use_token_scaling:true
-
-
Supported Models
+
+
Supported Models
deepseek_v2
gemma
diff --git a/search.json b/search.json
index 3d5d5fa91..fe2654612 100644
--- a/search.json
+++ b/search.json
@@ -3762,7 +3762,7 @@
"href": "docs/custom_integrations.html#kernels-integration",
"title": "Custom Integrations",
"section": "Kernels Integration",
- "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n _global_mapping = {\n \"batched_mm\": batched_mm_experts_forward,\n \"grouped_mm\": grouped_mm_experts_forward,\n }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation is incompatible with custom kernel options.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports both softmax->topk and sigmoid->topk routing strategies.\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\nSupported Models\nSee constants.py for the full list of supported model types (Qwen2-MoE, Qwen3-MoE, OLMoE, Mixtral, DeepSeek-V3, GLM-MoE, MiniMax, etc.).\n\n\n\nLimitations\nScatterMoE uses a softmax -> topk routing, so results may be different for some model architectures as baseline (GPT-OSS, etc). Incompatible with GLM_MOE_DSA (GLM 5) and GLM4_MOE_LITE (GLM 4.7 Flash) at the moment.\nSonicMoE supports both softmax->topk and sigmoid->topk routing, covering a wider range of architectures.\nScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm.\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here",
+ "text": "Kernels Integration\nMoE (Mixture of Experts) kernels speed up training for MoE layers and reduce VRAM costs. In transformers v5, batched_mm and grouped_mm were integrated as built-in options via the experts_implementation config kwarg:\nclass ExpertsInterface(GeneralInterface):\n _global_mapping = {\n \"batched_mm\": batched_mm_experts_forward,\n \"grouped_mm\": grouped_mm_experts_forward,\n }\nIn our custom integration, we add support for ScatterMoE and SonicMoE, which are more efficient and faster than grouped_mm.\n\nUsage\nAdd the following to your axolotl YAML config:\nplugins:\n - axolotl.integrations.kernels.KernelsPlugin\n\nuse_kernels: true\n\nuse_scattermoe: true\nuse_sonicmoe: true\nImportant: Setting experts_implementation is incompatible with custom kernel options.\n\n\nSonicMoE installation\nPrerequisites:\n- NVIDIA Hopper (H100, H200) or Blackwell (B200, GB200) GPU\n- CUDA 12.9+ (13.0+ for B300)\n- PyTorch 2.7+ (2.9.1 recommended)\n- For B300: Triton 3.6.0\npip install --ignore-requires-python --no-deps \"sonic-moe @ git+https://github.com/Dao-AILab/sonic-moe.git@116e2df0a41874f77fa0ad269ce7df3f0cfcb956\" && pip install nvidia-cutlass-dsl==4.4.0 quack-kernels==0.2.5\nSee the SonicMoE installation guide for the latest prerequisite details.\nNote: Blackwell support is in upstream beta. On Blackwell GPUs, Axolotl automatically sets USE_QUACK_GEMM=1 to enable the Blackwell kernels.\n\n\nHow It Works\nThe KernelsPlugin runs before model loading and:\n\n\nScatterMoE\n\nRegisters the ScatterMoE kernel from the local libs/scattermoe_lora package (includes fused LoRA support via Triton kernels).\nPatches the model’s SparseMoeBlock forward method with the optimized ScatterMoE implementation via the HF kernels library.\n\n\n\nSonicMoE\n\nResolves the model’s MoE block class(es) from constants.py.\nPatches the forward method with SonicMoE’s optimized CUTLASS kernels and registers a weight converter for the interleaved gate/up projection format.\nSupports pluggable routing strategies (see routing table below).\n\nBoth paths use the shared resolve_moe_block_classes utility in constants.py for model-type-to-class resolution.\n\n\nModel Support Matrix\nAll models use the SwiGLU activation (act_fn(gate) * up). Neither kernel currently supports non-SwiGLU MoE architectures.\n\n\nRouting strategies\n\n\n\n\n\n\n\n\n\nRouting Strategy\nDescription\nScatterMoE\nSonicMoE\n\n\n\n\nsoftmax → topk\nSoftmax over experts, select top-K, optional renormalization\nYes\nYes\n\n\nsoftmax → group selection → topk\nSoftmax, select top groups (sum of top-2 per group), topk from selected groups, renorm + scaling\nNo\nYes\n\n\nsigmoid → topk (with groups)\nSigmoid + bias correction, group-based masking, topk from masked scores, weights from original sigmoid\nYes\nYes\n\n\nsigmoid → topk (no groups)\nSigmoid + bias correction, straight topk (n_group=1)\nYes\nYes\n\n\nsoftmax → bias correction → topk\nSoftmax, bias via gate.moe_statics, topk, gather from original probs, clamp-based renorm\nNo\nYes\n\n\nsoftmax → group_limited_greedy\nSoftmax, group selection (max per group), topk, scale only (no renorm)\nNo\nYes\n\n\nsoftmax → topk via gate.wg\nSoftmax, gate weight at gate.wg.weight (not gate.weight), always renormalize\nNo\nYes\n\n\nfused topk → softmax\nRouting + expert computation fused in a single kernel\nNo\nPlanned\n\n\n\n\n\nPer-model support\n\n\n\n\n\n\n\n\n\n\nModel Type\nArchitecture\nRouting\nScatterMoE\nSonicMoE\n\n\n\n\nqwen2_moe\nQwen2-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_moe\nQwen3-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe\nQwen3.5-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_5_moe_text\nQwen3.5-MoE (VLM text)\nsoftmax → topk\nYes\nYes\n\n\nqwen3_next\nQwen3-Next\nsoftmax → topk\nYes\nYes\n\n\nqwen3_vl_moe\nQwen3-VL-MoE\nsoftmax → topk\nYes\nYes\n\n\nqwen3_omni_moe\nQwen3-Omni (Thinker + Talker)\nsoftmax → topk\nYes\nYes\n\n\nolmoe\nOLMoE\nsoftmax → topk\nYes\nYes\n\n\nmixtral\nMixtral\nsoftmax → topk\nYes\nYes\n\n\nminimax\nMiniMax\nsoftmax → topk\nYes\nYes\n\n\nmistral4\nMistral 4\nsoftmax → group → topk\nNo\nYes\n\n\nglm_moe_dsa\nGLM-MoE DSA (GLM 5)\nsigmoid → topk (groups)\nYes\nYes\n\n\ndeepseek_v3\nDeepSeek-V3\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe\nGLM4-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nglm4_moe_lite\nGLM4-MoE Lite (GLM 4.7 Flash)\nsigmoid → topk (groups)\nYes*\nYes\n\n\nglm4v_moe\nGLM4v-MoE\nsigmoid → topk (groups)\nYes\nYes\n\n\nminimax_m2\nMiniMax M2\nsigmoid → topk (no groups)\nYes\nYes\n\n\nernie4_5_moe\nERNIE 4.5 MoE\nsoftmax → bias → topk\nNo\nYes\n\n\ndeepseek_v2\nDeepSeek-V2\nsoftmax → group_limited_greedy\nNo\nYes\n\n\nhunyuan_v1_moe\nHunYuan V1 MoE\nsoftmax → topk (gate.wg)\nNo\nYes\n\n\ngpt_oss\nGPT-OSS\nfused topk → softmax\nNo\nPlanned\n\n\n\n* glm4_moe_lite with ScatterMoE may have issues — see Limitations.\n\n\nFeature comparison\n\n\n\n\n\n\n\n\nFeature\nScatterMoE\nSonicMoE\n\n\n\n\nKernel backend\nTriton\nCUTLASS\n\n\nGPU requirement\nAny CUDA\nHopper (H100/H200) or Blackwell (B200+)\n\n\nLoRA approach\nFused in Triton kernel\nRuntime materialization + custom autograd\n\n\nLoRA overhead\nLower (fused computation)\nHigher (per-forward materialization)\n\n\nGate/router LoRA\nYes\nYes\n\n\nExpert LoRA\nYes (fused)\nYes (materialized)\n\n\nShared expert LoRA\nYes (standard PEFT)\nYes (standard PEFT)\n\n\nSelective expert dequantization\nYes (~97% memory savings)\nNo\n\n\nWeight format\nTransposed [E, hidden, 2*inter]\nInterleaved gate/up [2*I, H, E]\n\n\ntorch.compile routing\nNo\nYes (optional)\n\n\n\n\n\nShared Expert Handling\nBoth kernels handle shared experts identically. Shared expert attribute names are detected in order of priority:\n\nshared_expert (Qwen2-MoE)\nshared_experts (GLM-MoE, DeepSeek-V3)\nshared_mlp (HunYuan V1 MoE)\n\nIf shared_expert_gate exists, sigmoid gating is applied to the shared expert contribution before adding it to the routed output. PEFT wraps shared expert linear layers with standard LoRA — no special handling is needed.\n\n\nLimitations\n\nScatterMoE + GLM4-MoE Lite: ScatterMoE does not work reliably for GLM 4.7 Flash (glm4_moe_lite).\nNon-SwiGLU activations: Neither kernel supports MoE architectures with non-SwiGLU expert activations (e.g., GPT-OSS uses a custom GLU variant).\nGPT-OSS: Deferred — requires transposed weight layout [E, H, 2*I], expert biases, and custom GLU activation. A dedicated forward path is needed.\nFSDP + fused gate LoRA (SonicMoE): The fused topk→softmax path materializes a local tensor when LoRA delta is present to avoid DTensor + Tensor mixing under FSDP.\n\n\n\nNote on MegaBlocks\nWe tested MegaBlocks but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.\nPlease see reference here",
"crumbs": [
"Advanced Features",
"Custom Integrations"
diff --git a/sitemap.xml b/sitemap.xml
index 9bf11c8e2..1cfb9e5c6 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,982 +2,982 @@
https://docs.axolotl.ai/FAQS.html
- 2026-04-02T12:02:09.344Z
+ 2026-04-02T14:18:48.298Zhttps://docs.axolotl.ai/docs/dataset-formats/template_free.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/dataset-formats/conversation.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/dataset-formats/pretraining.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/dataset-formats/index.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/api/cli.args.html
- 2026-04-02T12:05:41.667Z
+ 2026-04-02T14:22:11.891Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html
- 2026-04-02T12:05:42.175Z
+ 2026-04-02T14:22:12.392Zhttps://docs.axolotl.ai/docs/api/cli.preprocess.html
- 2026-04-02T12:05:41.762Z
+ 2026-04-02T14:22:11.984Zhttps://docs.axolotl.ai/docs/api/utils.collators.core.html
- 2026-04-02T12:05:43.042Z
+ 2026-04-02T14:22:13.255Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html
- 2026-04-02T12:05:42.210Z
+ 2026-04-02T14:22:12.427Zhttps://docs.axolotl.ai/docs/api/utils.schemas.enums.html
- 2026-04-02T12:05:42.795Z
+ 2026-04-02T14:22:13.009Zhttps://docs.axolotl.ai/docs/api/utils.lora.html
- 2026-04-02T12:05:42.522Z
+ 2026-04-02T14:22:12.738Zhttps://docs.axolotl.ai/docs/api/common.datasets.html
- 2026-04-02T12:05:43.038Z
+ 2026-04-02T14:22:13.252Zhttps://docs.axolotl.ai/docs/api/monkeypatch.relora.html
- 2026-04-02T12:05:42.389Z
+ 2026-04-02T14:22:12.605Zhttps://docs.axolotl.ai/docs/api/core.builders.base.html
- 2026-04-02T12:05:41.500Z
+ 2026-04-02T14:22:11.726Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html
- 2026-04-02T12:05:42.156Z
+ 2026-04-02T14:22:12.372Zhttps://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html
- 2026-04-02T12:05:43.012Z
+ 2026-04-02T14:22:13.225Zhttps://docs.axolotl.ai/docs/api/cli.inference.html
- 2026-04-02T12:05:41.726Z
+ 2026-04-02T14:22:11.949Zhttps://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_disk.html
- 2026-04-02T12:05:42.505Z
+ 2026-04-02T14:22:12.721Zhttps://docs.axolotl.ai/docs/api/core.datasets.chat.html
- 2026-04-02T12:05:41.570Z
+ 2026-04-02T14:22:11.795Zhttps://docs.axolotl.ai/docs/api/core.chat.format.shared.html
- 2026-04-02T12:05:41.563Z
+ 2026-04-02T14:22:11.789Zhttps://docs.axolotl.ai/docs/api/logging_config.html
- 2026-04-02T12:05:41.492Z
+ 2026-04-02T14:22:11.718Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html
- 2026-04-02T12:05:42.080Z
+ 2026-04-02T14:22:12.297Zhttps://docs.axolotl.ai/docs/api/utils.collators.mamba.html
- 2026-04-02T12:05:43.069Z
+ 2026-04-02T14:22:13.282Zhttps://docs.axolotl.ai/docs/api/cli.config.html
- 2026-04-02T12:05:41.702Z
+ 2026-04-02T14:22:11.925Zhttps://docs.axolotl.ai/docs/api/loaders.model.html
- 2026-04-02T12:05:41.944Z
+ 2026-04-02T14:22:12.161Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html
- 2026-04-02T12:05:42.249Z
+ 2026-04-02T14:22:12.466Zhttps://docs.axolotl.ai/docs/api/cli.quantize.html
- 2026-04-02T12:05:41.768Z
+ 2026-04-02T14:22:11.990Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html
- 2026-04-02T12:05:42.282Z
+ 2026-04-02T14:22:12.499Zhttps://docs.axolotl.ai/docs/api/integrations.spectrum.args.html
- 2026-04-02T12:05:43.016Z
+ 2026-04-02T14:22:13.230Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html
- 2026-04-02T12:05:42.188Z
+ 2026-04-02T14:22:12.405Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html
- 2026-04-02T12:05:43.134Z
+ 2026-04-02T14:22:13.346Zhttps://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html
- 2026-04-02T12:05:42.424Z
+ 2026-04-02T14:22:12.640Zhttps://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html
- 2026-04-02T12:05:42.466Z
+ 2026-04-02T14:22:12.682Zhttps://docs.axolotl.ai/docs/api/loaders.patch_manager.html
- 2026-04-02T12:05:41.986Z
+ 2026-04-02T14:22:12.203Zhttps://docs.axolotl.ai/docs/api/utils.model_shard_quant.html
- 2026-04-02T12:05:42.529Z
+ 2026-04-02T14:22:12.745Zhttps://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html
- 2026-04-02T12:05:42.764Z
+ 2026-04-02T14:22:12.978Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html
- 2026-04-02T12:05:43.139Z
+ 2026-04-02T14:22:13.351Zhttps://docs.axolotl.ai/docs/api/convert.html
- 2026-04-02T12:05:41.427Z
+ 2026-04-02T14:22:11.653Zhttps://docs.axolotl.ai/docs/api/cli.utils.html
- 2026-04-02T12:05:41.791Z
+ 2026-04-02T14:22:12.012Zhttps://docs.axolotl.ai/docs/api/kernels.lora.html
- 2026-04-02T12:05:42.329Z
+ 2026-04-02T14:22:12.546Zhttps://docs.axolotl.ai/docs/api/monkeypatch.utils.html
- 2026-04-02T12:05:42.432Z
+ 2026-04-02T14:22:12.647Zhttps://docs.axolotl.ai/docs/api/common.const.html
- 2026-04-02T12:05:43.020Z
+ 2026-04-02T14:22:13.233Zhttps://docs.axolotl.ai/docs/api/utils.freeze.html
- 2026-04-02T12:05:42.543Z
+ 2026-04-02T14:22:12.759Zhttps://docs.axolotl.ai/docs/api/utils.schemas.utils.html
- 2026-04-02T12:05:42.802Z
+ 2026-04-02T14:22:13.016Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.qat.html
- 2026-04-02T12:05:43.158Z
+ 2026-04-02T14:22:13.370Zhttps://docs.axolotl.ai/docs/api/utils.data.sft.html
- 2026-04-02T12:05:42.655Z
+ 2026-04-02T14:22:12.871Zhttps://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html
- 2026-04-02T12:05:42.381Z
+ 2026-04-02T14:22:12.597Zhttps://docs.axolotl.ai/docs/api/core.trainers.grpo.sampler.html
- 2026-04-02T12:05:41.930Z
+ 2026-04-02T14:22:12.148Zhttps://docs.axolotl.ai/docs/api/core.chat.messages.html
- 2026-04-02T12:05:41.558Z
+ 2026-04-02T14:22:11.783Zhttps://docs.axolotl.ai/docs/api/core.trainers.mamba.html
- 2026-04-02T12:05:41.886Z
+ 2026-04-02T14:22:12.107Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html
- 2026-04-02T12:05:42.229Z
+ 2026-04-02T14:22:12.445Zhttps://docs.axolotl.ai/docs/api/kernels.swiglu.html
- 2026-04-02T12:05:42.355Z
+ 2026-04-02T14:22:12.571Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html
- 2026-04-02T12:05:42.183Z
+ 2026-04-02T14:22:12.400Zhttps://docs.axolotl.ai/docs/api/utils.schemas.peft.html
- 2026-04-02T12:05:42.753Z
+ 2026-04-02T14:22:12.968Zhttps://docs.axolotl.ai/docs/api/utils.schemas.trl.html
- 2026-04-02T12:05:42.757Z
+ 2026-04-02T14:22:12.972Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.completion.html
- 2026-04-02T12:05:42.148Z
+ 2026-04-02T14:22:12.365Zhttps://docs.axolotl.ai/docs/api/cli.vllm_serve.html
- 2026-04-02T12:05:41.777Z
+ 2026-04-02T14:22:11.998Zhttps://docs.axolotl.ai/docs/api/utils.trainer.html
- 2026-04-02T12:05:42.565Z
+ 2026-04-02T14:22:12.781Zhttps://docs.axolotl.ai/docs/api/utils.ctx_managers.sequence_parallel.html
- 2026-04-02T12:05:42.037Z
+ 2026-04-02T14:22:12.254Zhttps://docs.axolotl.ai/docs/api/core.training_args.html
- 2026-04-02T12:05:41.528Z
+ 2026-04-02T14:22:11.754Zhttps://docs.axolotl.ai/docs/api/evaluate.html
- 2026-04-02T12:05:41.402Z
+ 2026-04-02T14:22:11.629Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html
- 2026-04-02T12:05:43.150Z
+ 2026-04-02T14:22:13.362Zhttps://docs.axolotl.ai/docs/api/loaders.tokenizer.html
- 2026-04-02T12:05:41.955Z
+ 2026-04-02T14:22:12.172Zhttps://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html
- 2026-04-02T12:05:42.379Z
+ 2026-04-02T14:22:12.595Zhttps://docs.axolotl.ai/docs/api/cli.cloud.modal_.html
- 2026-04-02T12:05:41.789Z
+ 2026-04-02T14:22:12.011Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html
- 2026-04-02T12:05:42.161Z
+ 2026-04-02T14:22:12.378Zhttps://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html
- 2026-04-02T12:05:42.433Z
+ 2026-04-02T14:22:12.649Zhttps://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html
- 2026-04-02T12:05:41.561Z
+ 2026-04-02T14:22:11.787Zhttps://docs.axolotl.ai/docs/api/utils.quantization.html
- 2026-04-02T12:05:42.680Z
+ 2026-04-02T14:22:12.895Zhttps://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html
- 2026-04-02T12:05:42.455Z
+ 2026-04-02T14:22:12.670Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html
- 2026-04-02T12:05:42.277Z
+ 2026-04-02T14:22:12.494Zhttps://docs.axolotl.ai/docs/api/cli.art.html
- 2026-04-02T12:05:41.671Z
+ 2026-04-02T14:22:11.895Zhttps://docs.axolotl.ai/docs/api/loaders.processor.html
- 2026-04-02T12:05:41.957Z
+ 2026-04-02T14:22:12.174Zhttps://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html
- 2026-04-02T12:05:41.751Z
+ 2026-04-02T14:22:11.973Zhttps://docs.axolotl.ai/docs/api/kernels.quantize.html
- 2026-04-02T12:05:42.370Z
+ 2026-04-02T14:22:12.586Zhttps://docs.axolotl.ai/docs/api/core.trainers.utils.html
- 2026-04-02T12:05:41.932Z
+ 2026-04-02T14:22:12.150Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html
- 2026-04-02T12:05:42.197Z
+ 2026-04-02T14:22:12.413Zhttps://docs.axolotl.ai/docs/api/cli.delinearize_llama4.html
- 2026-04-02T12:05:41.708Z
+ 2026-04-02T14:22:11.931Zhttps://docs.axolotl.ai/docs/faq.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/expert_quantization.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/checkpoint_saving.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/agents/pretraining.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/agents/grpo.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/agents/sft.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/multi-gpu.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/nd_parallelism.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/mac.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/reward_modelling.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/models/ministral3.html
- 2026-04-02T12:06:06.437Z
+ 2026-04-02T14:22:37.196Zhttps://docs.axolotl.ai/docs/models/hunyuan.html
- 2026-04-02T12:06:06.445Z
+ 2026-04-02T14:22:37.207Zhttps://docs.axolotl.ai/docs/models/smolvlm2.html
- 2026-04-02T12:06:06.444Z
+ 2026-04-02T14:22:37.205Zhttps://docs.axolotl.ai/docs/models/ministral3/vision.html
- 2026-04-02T12:06:06.437Z
+ 2026-04-02T14:22:37.197Zhttps://docs.axolotl.ai/docs/models/voxtral.html
- 2026-04-02T12:06:06.440Z
+ 2026-04-02T14:22:37.200Zhttps://docs.axolotl.ai/docs/models/ministral.html
- 2026-04-02T12:06:06.439Z
+ 2026-04-02T14:22:37.199Zhttps://docs.axolotl.ai/docs/models/granite4.html
- 2026-04-02T12:06:06.444Z
+ 2026-04-02T14:22:37.206Zhttps://docs.axolotl.ai/docs/models/phi.html
- 2026-04-02T12:06:06.443Z
+ 2026-04-02T14:22:37.205Zhttps://docs.axolotl.ai/docs/models/internvl3_5.html
- 2026-04-02T12:06:06.435Z
+ 2026-04-02T14:22:37.194Zhttps://docs.axolotl.ai/docs/models/magistral/think.html
- 2026-04-02T12:06:06.438Z
+ 2026-04-02T14:22:37.198Zhttps://docs.axolotl.ai/docs/models/mistral-small.html
- 2026-04-02T12:06:06.439Z
+ 2026-04-02T14:22:37.199Zhttps://docs.axolotl.ai/docs/models/gemma3n.html
- 2026-04-02T12:06:06.442Z
+ 2026-04-02T14:22:37.203Zhttps://docs.axolotl.ai/docs/models/arcee.html
- 2026-04-02T12:06:06.436Z
+ 2026-04-02T14:22:37.195Zhttps://docs.axolotl.ai/docs/models/llama-2.html
- 2026-04-02T12:06:06.441Z
+ 2026-04-02T14:22:37.202Zhttps://docs.axolotl.ai/docs/models/llama-4.html
- 2026-04-02T12:06:06.441Z
+ 2026-04-02T14:22:37.201Zhttps://docs.axolotl.ai/docs/models/seed-oss.html
- 2026-04-02T12:06:06.443Z
+ 2026-04-02T14:22:37.205Zhttps://docs.axolotl.ai/docs/models/jamba.html
- 2026-04-02T12:06:06.445Z
+ 2026-04-02T14:22:37.207Zhttps://docs.axolotl.ai/docs/nccl.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/multipack.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/debugging.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/dataset_preprocessing.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/vllm_serving.html
- 2026-04-02T12:02:09.352Z
+ 2026-04-02T14:18:48.305Zhttps://docs.axolotl.ai/docs/optimizers.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/ebft.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/torchao.html
- 2026-04-02T12:02:09.352Z
+ 2026-04-02T14:18:48.305Zhttps://docs.axolotl.ai/docs/lr_groups.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/streaming.html
- 2026-04-02T12:02:09.352Z
+ 2026-04-02T14:18:48.305Zhttps://docs.axolotl.ai/docs/amd_hpc.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/installation.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.303Zhttps://docs.axolotl.ai/docs/inference.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.303Zhttps://docs.axolotl.ai/docs/getting-started.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/telemetry.html
- 2026-04-02T12:02:09.352Z
+ 2026-04-02T14:18:48.305Zhttps://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html
- 2026-04-02T12:02:09.391Z
+ 2026-04-02T14:18:48.333Zhttps://docs.axolotl.ai/index.html
- 2026-04-02T12:02:09.380Z
+ 2026-04-02T14:18:48.326Zhttps://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html
- 2026-04-02T12:02:09.359Z
+ 2026-04-02T14:18:48.309Zhttps://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html
- 2026-04-02T12:02:09.389Z
+ 2026-04-02T14:18:48.333Zhttps://docs.axolotl.ai/docs/batch_vs_grad.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/sequence_parallelism.html
- 2026-04-02T12:02:09.352Z
+ 2026-04-02T14:18:48.305Zhttps://docs.axolotl.ai/docs/quantize.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/docker.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/attention.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/unsloth.html
- 2026-04-02T12:02:09.352Z
+ 2026-04-02T14:18:48.305Zhttps://docs.axolotl.ai/docs/qat.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/multi-node.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/custom_integrations.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/ray-integration.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/config-reference.html
- 2026-04-02T12:06:05.346Z
+ 2026-04-02T14:22:36.307Zhttps://docs.axolotl.ai/docs/gradient_checkpointing.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/grpo.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/choosing_method.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/models/LiquidAI.html
- 2026-04-02T12:06:06.444Z
+ 2026-04-02T14:22:37.206Zhttps://docs.axolotl.ai/docs/models/magistral.html
- 2026-04-02T12:06:06.438Z
+ 2026-04-02T14:22:37.198Zhttps://docs.axolotl.ai/docs/models/devstral.html
- 2026-04-02T12:06:06.440Z
+ 2026-04-02T14:22:37.200Zhttps://docs.axolotl.ai/docs/models/qwen3-next.html
- 2026-04-02T12:06:06.441Z
+ 2026-04-02T14:22:37.202Zhttps://docs.axolotl.ai/docs/models/mistral.html
- 2026-04-02T12:06:06.440Z
+ 2026-04-02T14:22:37.201Zhttps://docs.axolotl.ai/docs/models/plano.html
- 2026-04-02T12:06:06.434Z
+ 2026-04-02T14:22:37.193Zhttps://docs.axolotl.ai/docs/models/olmo3.html
- 2026-04-02T12:06:06.435Z
+ 2026-04-02T14:22:37.194Zhttps://docs.axolotl.ai/docs/models/magistral/vision.html
- 2026-04-02T12:06:06.439Z
+ 2026-04-02T14:22:37.199Zhttps://docs.axolotl.ai/docs/models/mimo.html
- 2026-04-02T12:06:06.434Z
+ 2026-04-02T14:22:37.193Zhttps://docs.axolotl.ai/docs/models/index.html
- 2026-04-02T12:06:06.446Z
+ 2026-04-02T14:22:37.208Zhttps://docs.axolotl.ai/docs/models/trinity.html
- 2026-04-02T12:06:06.435Z
+ 2026-04-02T14:22:37.195Zhttps://docs.axolotl.ai/docs/models/kimi-linear.html
- 2026-04-02T12:06:06.434Z
+ 2026-04-02T14:22:37.192Zhttps://docs.axolotl.ai/docs/models/orpheus.html
- 2026-04-02T12:06:06.446Z
+ 2026-04-02T14:22:37.208Zhttps://docs.axolotl.ai/docs/models/qwen3.html
- 2026-04-02T12:06:06.442Z
+ 2026-04-02T14:22:37.203Zhttps://docs.axolotl.ai/docs/models/ministral3/think.html
- 2026-04-02T12:06:06.437Z
+ 2026-04-02T14:22:37.196Zhttps://docs.axolotl.ai/docs/models/apertus.html
- 2026-04-02T12:06:06.442Z
+ 2026-04-02T14:22:37.204Zhttps://docs.axolotl.ai/docs/models/gpt-oss.html
- 2026-04-02T12:06:06.443Z
+ 2026-04-02T14:22:37.204Zhttps://docs.axolotl.ai/docs/mixed_precision.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/lora_optims.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/dataset_loading.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/input_output.html
- 2026-04-02T12:02:09.350Z
+ 2026-04-02T14:18:48.303Zhttps://docs.axolotl.ai/docs/fsdp_qlora.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/agents/preference_tuning.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/agents/reward_modelling.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/optimizations.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/training_stability.html
- 2026-04-02T12:02:09.352Z
+ 2026-04-02T14:18:48.305Zhttps://docs.axolotl.ai/docs/cli.html
- 2026-04-02T12:02:09.347Z
+ 2026-04-02T14:18:48.300Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html
- 2026-04-02T12:05:43.145Z
+ 2026-04-02T14:22:13.357Zhttps://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html
- 2026-04-02T12:05:43.040Z
+ 2026-04-02T14:22:13.253Zhttps://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html
- 2026-04-02T12:05:41.895Z
+ 2026-04-02T14:22:12.115Zhttps://docs.axolotl.ai/docs/api/cli.utils.fetch.html
- 2026-04-02T12:05:41.812Z
+ 2026-04-02T14:22:12.033Zhttps://docs.axolotl.ai/docs/api/core.builders.causal.html
- 2026-04-02T12:05:41.506Z
+ 2026-04-02T14:22:11.732Zhttps://docs.axolotl.ai/docs/api/core.builders.rl.html
- 2026-04-02T12:05:41.512Z
+ 2026-04-02T14:22:11.738Zhttps://docs.axolotl.ai/docs/api/utils.bench.html
- 2026-04-02T12:05:42.533Z
+ 2026-04-02T14:22:12.749Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html
- 2026-04-02T12:05:42.251Z
+ 2026-04-02T14:22:12.468Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html
- 2026-04-02T12:05:42.100Z
+ 2026-04-02T14:22:12.316Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html
- 2026-04-02T12:05:42.098Z
+ 2026-04-02T14:22:12.314Zhttps://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html
- 2026-04-02T12:05:43.075Z
+ 2026-04-02T14:22:13.288Zhttps://docs.axolotl.ai/docs/api/utils.schedulers.html
- 2026-04-02T12:05:42.604Z
+ 2026-04-02T14:22:12.819Zhttps://docs.axolotl.ai/docs/api/kernels.utils.html
- 2026-04-02T12:05:42.372Z
+ 2026-04-02T14:22:12.588Zhttps://docs.axolotl.ai/docs/api/core.chat.format.chatml.html
- 2026-04-02T12:05:41.560Z
+ 2026-04-02T14:22:11.785Zhttps://docs.axolotl.ai/docs/api/loaders.constants.html
- 2026-04-02T12:05:41.988Z
+ 2026-04-02T14:22:12.205Zhttps://docs.axolotl.ai/docs/api/utils.schemas.model.html
- 2026-04-02T12:05:42.708Z
+ 2026-04-02T14:22:12.923Zhttps://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html
- 2026-04-02T12:05:42.994Z
+ 2026-04-02T14:22:13.207Zhttps://docs.axolotl.ai/docs/api/cli.utils.load.html
- 2026-04-02T12:05:41.819Z
+ 2026-04-02T14:22:12.041Zhttps://docs.axolotl.ai/docs/api/loaders.adapter.html
- 2026-04-02T12:05:41.964Z
+ 2026-04-02T14:22:12.181Zhttps://docs.axolotl.ai/docs/api/cli.train.html
- 2026-04-02T12:05:41.632Z
+ 2026-04-02T14:22:11.856Zhttps://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html
- 2026-04-02T12:05:42.441Z
+ 2026-04-02T14:22:12.656Zhttps://docs.axolotl.ai/docs/api/cli.checks.html
- 2026-04-02T12:05:41.680Z
+ 2026-04-02T14:22:11.903Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html
- 2026-04-02T12:05:42.227Z
+ 2026-04-02T14:22:12.443Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html
- 2026-04-02T12:05:42.141Z
+ 2026-04-02T14:22:12.357Zhttps://docs.axolotl.ai/docs/api/core.trainers.trl.html
- 2026-04-02T12:05:41.879Z
+ 2026-04-02T14:22:12.100Zhttps://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html
- 2026-04-02T12:05:42.383Z
+ 2026-04-02T14:22:12.598Zhttps://docs.axolotl.ai/docs/api/core.trainers.mixins.scheduler.html
- 2026-04-02T12:05:42.008Z
+ 2026-04-02T14:22:12.225Zhttps://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html
- 2026-04-02T12:05:41.915Z
+ 2026-04-02T14:22:12.133Zhttps://docs.axolotl.ai/docs/api/cli.merge_lora.html
- 2026-04-02T12:05:41.737Z
+ 2026-04-02T14:22:11.959Zhttps://docs.axolotl.ai/docs/api/datasets.html
- 2026-04-02T12:05:41.409Z
+ 2026-04-02T14:22:11.637Zhttps://docs.axolotl.ai/docs/api/utils.schemas.training.html
- 2026-04-02T12:05:42.717Z
+ 2026-04-02T14:22:12.932Zhttps://docs.axolotl.ai/docs/api/utils.distributed.html
- 2026-04-02T12:05:42.629Z
+ 2026-04-02T14:22:12.845Zhttps://docs.axolotl.ai/docs/api/cli.cloud.base.html
- 2026-04-02T12:05:41.781Z
+ 2026-04-02T14:22:12.003Zhttps://docs.axolotl.ai/docs/api/kernels.geglu.html
- 2026-04-02T12:05:42.342Z
+ 2026-04-02T14:22:12.559Zhttps://docs.axolotl.ai/docs/api/core.trainers.mixins.optimizer.html
- 2026-04-02T12:05:41.995Z
+ 2026-04-02T14:22:12.212Zhttps://docs.axolotl.ai/docs/api/index.html
- 2026-04-02T12:05:41.309Z
+ 2026-04-02T14:22:11.538Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.base.html
- 2026-04-02T12:05:42.039Z
+ 2026-04-02T14:22:12.256Zhttps://docs.axolotl.ai/docs/api/cli.evaluate.html
- 2026-04-02T12:05:41.642Z
+ 2026-04-02T14:22:11.866Zhttps://docs.axolotl.ai/docs/api/train.html
- 2026-04-02T12:05:41.388Z
+ 2026-04-02T14:22:11.616Zhttps://docs.axolotl.ai/docs/api/common.architectures.html
- 2026-04-02T12:05:43.018Z
+ 2026-04-02T14:22:13.231Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html
- 2026-04-02T12:05:42.239Z
+ 2026-04-02T14:22:12.456Zhttps://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html
- 2026-04-02T12:05:43.141Z
+ 2026-04-02T14:22:13.353Zhttps://docs.axolotl.ai/docs/api/cli.utils.train.html
- 2026-04-02T12:05:41.842Z
+ 2026-04-02T14:22:12.063Zhttps://docs.axolotl.ai/docs/api/integrations.liger.args.html
- 2026-04-02T12:05:43.007Z
+ 2026-04-02T14:22:13.221Zhttps://docs.axolotl.ai/docs/api/prompt_tokenizers.html
- 2026-04-02T12:05:41.480Z
+ 2026-04-02T14:22:11.706Zhttps://docs.axolotl.ai/docs/api/cli.utils.sweeps.html
- 2026-04-02T12:05:41.827Z
+ 2026-04-02T14:22:12.048Zhttps://docs.axolotl.ai/docs/api/cli.utils.args.html
- 2026-04-02T12:05:41.806Z
+ 2026-04-02T14:22:12.027Zhttps://docs.axolotl.ai/docs/api/utils.chat_templates.html
- 2026-04-02T12:05:42.516Z
+ 2026-04-02T14:22:12.732Zhttps://docs.axolotl.ai/docs/api/utils.schemas.config.html
- 2026-04-02T12:05:42.699Z
+ 2026-04-02T14:22:12.914Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html
- 2026-04-02T12:05:42.125Z
+ 2026-04-02T14:22:12.341Zhttps://docs.axolotl.ai/docs/api/utils.schemas.datasets.html
- 2026-04-02T12:05:42.742Z
+ 2026-04-02T14:22:12.957Zhttps://docs.axolotl.ai/docs/api/integrations.base.html
- 2026-04-02T12:05:42.988Z
+ 2026-04-02T14:22:13.202Zhttps://docs.axolotl.ai/docs/api/utils.tokenization.html
- 2026-04-02T12:05:42.514Z
+ 2026-04-02T14:22:12.730Zhttps://docs.axolotl.ai/docs/api/monkeypatch.multipack.html
- 2026-04-02T12:05:42.384Z
+ 2026-04-02T14:22:12.600Zhttps://docs.axolotl.ai/docs/api/integrations.kd.trainer.html
- 2026-04-02T12:05:43.003Z
+ 2026-04-02T14:22:13.217Zhttps://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html
- 2026-04-02T12:05:42.468Z
+ 2026-04-02T14:22:12.684Zhttps://docs.axolotl.ai/docs/api/core.trainers.base.html
- 2026-04-02T12:05:41.860Z
+ 2026-04-02T14:22:12.081Zhttps://docs.axolotl.ai/docs/api/utils.schemas.integrations.html
- 2026-04-02T12:05:42.784Z
+ 2026-04-02T14:22:12.998Zhttps://docs.axolotl.ai/docs/api/core.trainers.mixins.rng_state_loader.html
- 2026-04-02T12:05:41.999Z
+ 2026-04-02T14:22:12.216Zhttps://docs.axolotl.ai/docs/api/cli.main.html
- 2026-04-02T12:05:41.621Z
+ 2026-04-02T14:22:11.846Zhttps://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html
- 2026-04-02T12:05:42.445Z
+ 2026-04-02T14:22:12.661Zhttps://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html
- 2026-04-02T12:05:41.579Z
+ 2026-04-02T14:22:11.804Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html
- 2026-04-02T12:05:42.114Z
+ 2026-04-02T14:22:12.331Zhttps://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html
- 2026-04-02T12:05:42.992Z
+ 2026-04-02T14:22:13.206Zhttps://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html
- 2026-04-02T12:05:42.453Z
+ 2026-04-02T14:22:12.668Zhttps://docs.axolotl.ai/docs/api/utils.data.streaming.html
- 2026-04-02T12:05:42.648Z
+ 2026-04-02T14:22:12.863Zhttps://docs.axolotl.ai/docs/api/utils.collators.batching.html
- 2026-04-02T12:05:43.065Z
+ 2026-04-02T14:22:13.278Zhttps://docs.axolotl.ai/docs/api/utils.samplers.multipack.html
- 2026-04-02T12:05:43.126Z
+ 2026-04-02T14:22:13.338Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html
- 2026-04-02T12:05:42.223Z
+ 2026-04-02T14:22:12.440Zhttps://docs.axolotl.ai/docs/api/utils.dict.html
- 2026-04-02T12:05:42.636Z
+ 2026-04-02T14:22:12.851Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html
- 2026-04-02T12:05:42.225Z
+ 2026-04-02T14:22:12.442Zhttps://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html
- 2026-04-02T12:05:42.646Z
+ 2026-04-02T14:22:12.861Zhttps://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html
- 2026-04-02T12:05:42.170Z
+ 2026-04-02T14:22:12.387Zhttps://docs.axolotl.ai/docs/api/monkeypatch.gradient_checkpointing.offload_cpu.html
- 2026-04-02T12:05:42.472Z
+ 2026-04-02T14:22:12.688Zhttps://docs.axolotl.ai/docs/rlhf.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Zhttps://docs.axolotl.ai/docs/dataset-formats/inst_tune.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/dataset-formats/tokenized.html
- 2026-04-02T12:02:09.348Z
+ 2026-04-02T14:18:48.301Zhttps://docs.axolotl.ai/docs/multimodal.html
- 2026-04-02T12:02:09.351Z
+ 2026-04-02T14:18:48.304Z