Files
axolotl/tests/utils/schemas/validation
Wing Lian 80a97f192b validate batch shape against num_generations at config time
Surfaces a class of GRPO config errors at axolotl-train startup instead
of letting them bubble out of GRPOTrainer.__init__ after the model loads.
Three checks under RLValidationMixin.check_grpo_batch_size_divisibility:

  - effective generation_batch_size (or mb*GA fallback) must be divisible
    by trl.num_generations, with a hint pointing at the smallest GA bump
    that fixes the violation
  - num_generations >= 2 (group-relative advantage needs variance; with
    num_gen=1 the policy never updates)
  - When world_size > 1, effective gbs >= num_generations * world_size

11 unit tests cover the table: divisible/non-divisible, explicit and
implicit gbs, multi-rank constraint, GRPO-disabled passthrough, and
unset num_generations.
2026-04-15 13:27:30 +00:00
..
2026-01-27 17:08:24 -05:00