Dan Saunders
80304c26a7
SP GRPO support + batch SP fixes (#2643)
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo
2025-05-12 17:52:40 -04:00
..
2025-05-06 20:05:32 -04:00
2025-01-29 00:10:19 -05:00
2025-03-21 12:26:47 -04:00
2025-02-25 16:09:37 +07:00
2024-04-04 13:43:40 +09:00
2025-04-24 01:03:43 -04:00
2025-05-12 14:17:25 -04:00
2025-05-01 12:25:16 -04:00
2025-04-07 12:41:13 -04:00
2025-03-21 12:26:47 -04:00
2025-02-25 16:09:37 +07:00
2025-04-25 17:14:48 -04:00
2025-04-28 10:11:20 -04:00
2024-07-11 09:19:29 -04:00
2025-03-17 08:39:04 -04:00
2025-03-17 08:39:04 -04:00
2025-02-25 16:09:37 +07:00
2025-04-24 01:03:43 -04:00
2025-04-02 09:33:46 -04:00
2025-01-24 12:56:28 -05:00
2025-02-25 16:09:37 +07:00
2025-04-11 09:52:43 -04:00
2025-02-25 16:09:37 +07:00
2025-04-29 15:10:59 -04:00
2024-03-21 22:28:36 -07:00
2025-02-25 16:09:37 +07:00
2025-02-25 16:09:37 +07:00
2025-03-17 08:38:19 -04:00
2025-04-28 10:07:45 -04:00
2025-05-12 17:52:40 -04:00
2025-02-25 16:09:37 +07:00
2025-02-25 16:09:37 +07:00