SP GRPO support + batch SP fixes (#2643)
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo