Files
axolotl/tests/monkeypatch
Wing Lian dc16859983 [gemma4] fix fused RMSNorm+RoPE on hybrid attention models
- Kernel: fused_rms_norm_rope crashed when cos.shape[-1] < x.shape[-1].
  Triton forward/backward take an n_rot runtime arg that restricts
  rotate_half to [0, n_rot) and treats trailing cols as RMSNorm-only
  pass-through (cos=1, sin=0 defaults). Wrapper also expands cos/sin
  that broadcast over batch.

- Forward: _make_fused_forward used a stale shared_kv_states kwarg the
  current decoder layer no longer passes. Now mirrors stock attention,
  reading/writing past_key_values.shared_layers.
2026-04-15 13:27:31 +00:00
..