fast path

2025-09-15 18:57:13 -04:00
parent 479b6144df
commit 125e7b5fe6
8 changed files with 79 additions and 16 deletions
--- a/docs/moe_backends.md
+++ b/docs/moe_backends.md
@@ -1,8 +1,8 @@
 MoE Backends in Axolotl

-Axolotl supports selecting a Mixture-of-Experts (MoE) compute backend via an environment variable:
+Axolotl supports selecting a Mixture-of-Experts (MoE) compute backend via the training config (YAML):

- AXOLOTL_MOE_BACKEND=auto|hf_triton|torch_grouped|naive
+- Set `moe_backend: auto|hf_triton|torch_grouped|naive`

 Behavior
 - auto (default): prefers PyTorch 2.8+ grouped GEMM, then Hugging Face kernels hub, otherwise naive.
@@ -12,7 +12,8 @@ Behavior

 Notes
 - Current implementation wires the backend selector and routes Mixtral MoE through it. The hf_triton path is initially a stub: it uses kernels hub for routing but still falls back to per-expert computation until grouped GEMM is fully integrated.
- No changes to training scripts are required; Axolotl wraps Transformers Trainer; selection happens inside the model forward.
+- No changes to training scripts are required; selection happens inside the model forward. The `AXOLOTL_MOE_BACKEND` environment variable is no longer used.

 Example
-AXOLOTL_MOE_BACKEND=hf_triton accelerate launch -m axolotl.cli.train path/to/config.yaml
+moe_backend: hf_triton
+accelerate launch -m axolotl.cli.train path/to/config.yaml