Surfaces a class of GRPO config errors at axolotl-train startup instead
of letting them bubble out of GRPOTrainer.__init__ after the model loads.
Three checks under RLValidationMixin.check_grpo_batch_size_divisibility:
- effective generation_batch_size (or mb*GA fallback) must be divisible
by trl.num_generations, with a hint pointing at the smallest GA bump
that fixes the violation
- num_generations >= 2 (group-relative advantage needs variance; with
num_gen=1 the policy never updates)
- When world_size > 1, effective gbs >= num_generations * world_size
11 unit tests cover the table: divisible/non-divisible, explicit and
implicit gbs, multi-rank constraint, GRPO-disabled passthrough, and
unset num_generations.