fix(doc): clarify support

2026-02-26 13:52:00 +07:00
parent c58eaaae51
commit c329b43fdd
1 changed files with 2 additions and 0 deletions
--- a/src/axolotl/integrations/kernels/README.md
+++ b/src/axolotl/integrations/kernels/README.md
@@ -39,6 +39,8 @@ This works for any MoE model in transformers that uses a `SparseMoeBlock` class
 ScatterMoE uses a softmax -> topk routing, so results may be different for some model arch as baseline (GPT-OSS, GLM_MOE_DSA).
 ScatterMoE does not work for GLM4.7 Flash (glm4_moe_lite) atm.
 ## Note on MegaBlocks
 We tested [MegaBlocks](https://huggingface.co/kernels-community/megablocks) but were unable to ensure numerical accuracy, so we did not integrate it. It was also incompatible with many newer model architectures in transformers.