NanoCode012
6a8baf8fa7
feat: add sonicmoe ( #3411 )
...
* feat: add sonicmoe
* feat: add torch compile for routing
* feat: add routing smoke test
* feat: add qwen3_5_moe, qwen3_vl_moe, qwen3_omni_moe
* fix: disable mlp kernel for sonicmoe too
* feat: update to sonicmoe release
* chore: update import following new sonicmoe changes
* feat: update handling for blackwell
* feat: add sonicmoe e2e test
* fix: installation for updated sonicmoe
* fix: git commit
* fix: ignore py req and fix metadata
* fix: increase min hidden size to match sonicmoe kernel min
* fix: attempt properly interleave and handle unpatch mid-test
* chore: refactor teardown better
* chore: refactor to re-use rearrange
* fix: add idempotency guard
* fix: address comments on CI memory and interleave
* fix: tests grad, param doublewrapped
2026-03-05 13:43:31 -05:00
Wing Lian
68f1b7004c
ScatterMoE LoRA support ( #3410 )
...
* scattermoe lora support
* fsdp, bf16, dim fixes
* expert weights aren't needed in save for bwd since they are frozen
* use sonicmoe optim options
* update save model from upstream
* fixes per code review feedback and add tests
* revert removal of CP fix
* misc fixes
2026-02-24 14:59:55 -05:00