fix ddp/fsdp w gemma4 (#3584)

* fix ddp/fsdp w gemma4 * address pr comments * activation offloading fix and update agent docs for gemma4
2026-04-09 20:02:36 -07:00
parent 7daf7d96f1
commit 4ef608dda3
9 changed files with 398 additions and 2 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -38,6 +38,8 @@ Agent-specific references:
 - [docs/agents/grpo.md](docs/agents/grpo.md) — GRPO online RL with reward functions
 - [docs/agents/reward_modelling.md](docs/agents/reward_modelling.md) — outcome and process reward models
 - [docs/agents/pretraining.md](docs/agents/pretraining.md) — continual pretraining
+- [docs/agents/model_architectures.md](docs/agents/model_architectures.md) — model-specific quirks (Gemma4, Qwen3.5 MoE, etc.)
+- [docs/agents/new_model_support.md](docs/agents/new_model_support.md) — debugging and adding support for new model architectures

 ## Config Pattern