axolotl

tocmo0nlord/axolotl

Fork 0

Commit Graph

Author	SHA1	Message	Date
Wing Lian	7617b951a8	make _maybe_sync_vllm_weights actually fire in sync mode Two bugs in ``AsyncGRPOTrainer._maybe_sync_vllm_weights`` plus a companion bug in the sync-hook patch site that together neutralized LoRA weight sync entirely whenever ``async_prefetch=False`` was combined with NeMo Gym's data-producer path: 1. ``_maybe_sync_vllm_weights`` had ``if not async_prefetch: return`` at the top. The original design assumed sync mode would fall back to TRL's stock per-step ``sync_weights`` call inside ``_generate_single_turn`` — true for vanilla GRPO but FALSE in NeMo Gym multi-turn, where ``NemoGymDataProducer`` calls the agent server directly and ``_generate_single_turn`` is never invoked. Result: no sync ever happened in NeMo Gym sync mode. 2. ``step % vllm_sync_interval`` would TypeError on the first call if ``vllm_sync_interval`` was unset (the default for any config that doesn't explicitly set it). 3. The ``_generate_single_turn`` patch installed ``vllm_generation.sync_weights = lambda: None`` unconditionally for vllm_lora_sync runs. That's correct in async-prefetch mode (BG thread can't safely sync) but wrong in sync mode: TRL's per-step auto-sync inside ``_generate_single_turn`` was the fallback that the early return in (1) was assuming, and the no-op patch was killing it. Fix: - Drop the ``not async_prefetch`` early return; ``_maybe_sync_vllm_weights`` is now the canonical sync trigger and runs in both modes from ``_prepare_inputs_with_data_producer`` / ``_prepare_inputs_legacy_async``. - Default ``vllm_sync_interval`` to 1 when unset. - In the ``_generate_single_turn`` patch, route sync_weights to ``_sync_lora_adapter`` in sync mode (and keep the lambda no-op in async mode for the BG-thread safety reason).	2026-04-15 13:27:30 +00:00
Wing Lian	b3289fd190	feat: LoRA kernel support for bias, dropout, dora, embeddings (#3528 ) [skip ci] * feat: LoRA kernel support for bias, dropout, dora, embeddings * chore: lint * chore: lint * address PR feedback, add regression tests, add fsdp2 tests for lora kernels * update tests for new sigs * update tests now that bias and dropout are supported	2026-03-22 13:53:19 -04:00
Wing Lian	5ef3f28340	Support for Async GRPO (#3486 ) * async grpo support * implement data producer * use fast async * handle call to create data producer * fix liger kernel setup * fix replay buffer * chore: lint * make gpus go brrr * chore: lint * inplace div_, unwrap model for logits in bf16 * fuse selective softmax and empty cuda cache on each scoring step * remove waiting for synch time and fix race * make fp8 work and allow lora kernels w rl * grpo with lora vllm sync and fixes for sharded distributed * update docs * more patches so it works against trl main * address PR feedback for corerabbit	2026-03-17 11:42:47 -04:00

Author

SHA1

Message

Date

Wing Lian

7617b951a8

make _maybe_sync_vllm_weights actually fire in sync mode

Two bugs in ``AsyncGRPOTrainer._maybe_sync_vllm_weights`` plus a
companion bug in the sync-hook patch site that together neutralized
LoRA weight sync entirely whenever ``async_prefetch=False`` was
combined with NeMo Gym's data-producer path:

1. ``_maybe_sync_vllm_weights`` had ``if not async_prefetch: return``
   at the top. The original design assumed sync mode would fall back
   to TRL's stock per-step ``sync_weights`` call inside
   ``_generate_single_turn`` — true for vanilla GRPO but FALSE in
   NeMo Gym multi-turn, where ``NemoGymDataProducer`` calls the agent
   server directly and ``_generate_single_turn`` is never invoked.
   Result: no sync ever happened in NeMo Gym sync mode.

2. ``step % vllm_sync_interval`` would TypeError on the first call if
   ``vllm_sync_interval`` was unset (the default for any config that
   doesn't explicitly set it).

3. The ``_generate_single_turn`` patch installed
   ``vllm_generation.sync_weights = lambda: None`` unconditionally
   for vllm_lora_sync runs. That's correct in async-prefetch mode
   (BG thread can't safely sync) but wrong in sync mode: TRL's
   per-step auto-sync inside ``_generate_single_turn`` was the
   fallback that the early return in (1) was assuming, and the
   no-op patch was killing it.

Fix:
  - Drop the ``not async_prefetch`` early return; ``_maybe_sync_vllm_weights``
    is now the canonical sync trigger and runs in both modes from
    ``_prepare_inputs_with_data_producer`` / ``_prepare_inputs_legacy_async``.
  - Default ``vllm_sync_interval`` to 1 when unset.
  - In the ``_generate_single_turn`` patch, route sync_weights to
    ``_sync_lora_adapter`` in sync mode (and keep the lambda no-op
    in async mode for the BG-thread safety reason).

2026-04-15 13:27:30 +00:00

Wing Lian

b3289fd190

feat: LoRA kernel support for bias, dropout, dora, embeddings (#3528 ) [skip ci]

* feat: LoRA kernel support for bias, dropout, dora, embeddings

* chore: lint

* chore: lint

* address PR feedback, add regression tests, add fsdp2 tests for lora kernels

* update tests for new sigs

* update tests now that bias and dropout are supported

2026-03-22 13:53:19 -04:00

Wing Lian

5ef3f28340

Support for Async GRPO (#3486 )

* async grpo support

* implement data producer

* use fast async

* handle call to create data producer

* fix liger kernel setup

* fix replay buffer

* chore: lint

* make gpus go brrr

* chore: lint

* inplace div_, unwrap model for logits in bf16

* fuse selective softmax and empty cuda cache on each scoring step

* remove waiting for synch time and fix race

* make fp8 work and allow lora kernels w rl

* grpo with lora vllm sync and fixes for sharded distributed

* update docs

* more patches so it works against trl main

* address PR feedback for corerabbit

2026-03-17 11:42:47 -04:00

3 Commits