axolotl/scripts/cutcrossentropy_install.py at 1f1ebb8237a87dd5f3b8ca544e0ec5e970bc9afb

Files

Wing Lian 1fc86d5295 Scattermoe LoRA optimizations (#3513 )

* optimize moe + lora

* more scattermoe optims

* selective dequant

* add correctness unit tests and benchmarks for scattermoe + lora

* handle base+lora split kernel for older moe models

* chore: lint

* fix casting for H200 and B200

* register pressure estimation and pruning for h200/b200

* use soft limit for pruning

* qkv patch for qwen3.5moe

* support text_model for qwen3.5 moe

* nesting of qwen3

* use udpated cce with zero3 support

* Fix decomposed backward for QKV and O projections

eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.

2026-03-19 23:07:42 -04:00

874 B

Raw Blame History

View Raw

874 B Raw Blame History

874 B

Raw Blame History