Files
axolotl/examples
Wing Lian 1fc86d5295 Scattermoe LoRA optimizations (#3513)
* optimize moe + lora

* more scattermoe optims

* selective dequant

* add correctness unit tests and benchmarks for scattermoe + lora

* handle base+lora split kernel for older moe models

* chore: lint

* fix casting for H200 and B200

* register pressure estimation and pruning for h200/b200

* use soft limit for pruning

* qkv patch for qwen3.5moe

* support text_model for qwen3.5 moe

* nesting of qwen3

* use udpated cce with zero3 support

* Fix decomposed backward for QKV and O projections

eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.
2026-03-19 23:07:42 -04:00
..
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2026-01-28 06:44:15 -05:00
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2026-01-27 17:08:24 -05:00
2026-03-05 13:40:45 -05:00
2026-01-21 20:00:18 -05:00
2026-01-27 17:08:24 -05:00
2025-12-19 10:43:47 -05:00
2026-03-06 09:31:00 -05:00
2025-09-02 12:08:44 -04:00
2026-01-21 20:00:18 -05:00