Dan Saunders
80304c26a7
SP GRPO support + batch SP fixes (#2643)
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo
2025-05-12 17:52:40 -04:00
..
2025-05-07 15:06:07 -04:00
2024-01-09 21:23:23 -05:00
2025-05-12 10:51:18 -04:00
2025-04-25 21:11:17 -04:00
2025-03-21 11:02:43 -04:00
2025-02-18 09:59:27 +07:00
2025-05-07 15:06:07 -04:00
2025-04-28 10:08:07 -04:00
2025-04-28 10:08:07 -04:00
2025-04-28 10:08:07 -04:00
2025-05-12 10:51:18 -04:00
2025-05-12 10:51:18 -04:00
2025-05-07 15:06:07 -04:00
2025-05-06 23:40:44 -04:00
2025-05-12 10:51:18 -04:00
2025-05-07 15:06:07 -04:00
2025-05-12 17:52:40 -04:00
2025-03-21 11:02:43 -04:00
2025-04-24 13:01:43 -04:00