Files
axolotl/docs
Wing Lian 5ef3f28340 Support for Async GRPO (#3486)
* async grpo support

* implement data producer

* use fast async

* handle call to create data producer

* fix liger kernel setup

* fix replay buffer

* chore: lint

* make gpus go brrr

* chore: lint

* inplace div_, unwrap model for logits in bf16

* fuse selective softmax and empty cuda cache on each scoring step

* remove waiting for synch time and fix race

* make fp8 work and allow lora kernels w rl

* grpo with lora vllm sync and fixes for sharded distributed

* update docs

* more patches so it works against trl main

* address PR feedback for corerabbit
2026-03-17 11:42:47 -04:00
..
2026-01-27 17:08:24 -05:00
2026-03-16 00:13:18 -04:00
2025-06-18 15:36:53 -04:00
2026-01-01 06:52:45 -05:00
2025-06-18 15:36:53 -04:00
2025-09-17 10:38:15 +01:00
2026-03-17 11:42:47 -04:00
2025-09-02 12:08:44 -04:00