Dan Saunders
80304c26a7
SP GRPO support + batch SP fixes (#2643)
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo
2025-05-12 17:52:40 -04:00
..
2025-05-09 20:28:58 -04:00
2025-03-21 11:02:43 -04:00
2025-05-12 17:52:40 -04:00
2025-05-12 17:52:40 -04:00
2025-04-21 10:31:50 -04:00
2023-11-06 18:33:01 -05:00
2023-09-15 15:46:54 -04:00
2025-04-12 07:24:43 -07:00
2025-04-28 10:08:07 -04:00
2025-04-24 13:01:43 -04:00
2025-05-07 15:06:07 -04:00
2025-05-07 15:06:07 -04:00
2025-04-12 07:24:43 -07:00
2025-04-12 07:24:43 -07:00
2025-03-21 11:02:43 -04:00
2025-02-18 09:59:27 +07:00
2025-04-24 13:01:43 -04:00
2025-04-28 10:08:07 -04:00
2025-04-28 10:08:07 -04:00
2025-04-28 10:10:28 -04:00
2025-02-18 09:59:27 +07:00
2025-05-07 15:06:07 -04:00
2025-04-12 07:24:43 -07:00
2025-05-07 10:31:46 -04:00
2025-04-21 10:31:50 -04:00
2025-04-24 13:01:43 -04:00
2025-04-24 13:01:43 -04:00
2025-02-18 09:59:27 +07:00
2025-02-18 09:59:27 +07:00
2025-04-28 10:08:07 -04:00
2025-05-01 12:25:16 -04:00