From c329b43fddc54c178c9bb1e2200b9c9b88a5e006 Mon Sep 17 00:00:00 2001 From: NanoCode012 Date: Thu, 26 Feb 2026 13:52:00 +0700 Subject: [PATCH] fix(doc): clarify support --- src/axolotl/integrations/kernels/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/axolotl/integrations/kernels/README.md b/src/axolotl/integrations/kernels/README.md index 96ff7b328..237d653cf 100644 --- a/src/axolotl/integrations/kernels/README.md +++ b/src/axolotl/integrations/kernels/README.md @@ -39,6 +39,8 @@ This works for any MoE model in transformers that uses a `SparseMoeBlock` class ScatterMoE uses a softmax -> topk routing, so results may be different for some model arch as baseline (GPT-OSS, GLM_MOE_DSA). +ScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm. + ## Note on MegaBlocks We tested [MegaBlocks](https://huggingface.co/kernels-community/megablocks) but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.