Dan Saunders
80304c26a7
SP GRPO support + batch SP fixes (#2643)
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo
2025-05-12 17:52:40 -04:00
..
2025-04-11 09:51:59 -04:00
2025-03-31 13:40:12 +07:00
2025-05-12 17:52:40 -04:00
2023-12-12 09:39:22 -08:00
2025-03-21 11:02:43 -04:00
2025-03-21 11:02:43 -04:00
2025-05-01 13:24:38 -04:00
2025-04-30 03:32:55 -04:00
2025-04-28 10:08:07 -04:00
2025-03-31 13:40:12 +07:00
2025-05-06 11:09:07 -04:00
2024-12-02 08:47:10 -05:00
2025-04-05 01:25:44 -04:00
2025-03-31 13:40:12 +07:00
2025-05-07 15:06:07 -04:00
2025-03-21 11:02:43 -04:00
2025-04-24 13:01:43 -04:00
2025-03-21 11:02:43 -04:00
2024-03-14 11:05:42 -04:00
2025-03-21 11:02:43 -04:00
2025-04-28 10:08:07 -04:00
2025-05-09 20:28:58 -04:00
2025-03-31 13:40:12 +07:00
2025-03-29 08:30:06 -04:00
2025-03-21 11:02:43 -04:00
2025-04-01 08:47:50 -04:00
2024-08-22 11:46:57 -04:00
2025-03-21 11:02:43 -04:00
2025-04-05 01:25:44 -04:00
2025-04-01 09:39:12 -04:00